# Export data to spreadsheet

Data in neo4j is exported to a spreadsheet for further analysis.

In neo4j `Respondent` has relationship `CHOSE_ALT` to `QuestionAlternative` which has relationship `CONSISTS_OF` from `Question`.

The `Respondent` has a property `id` which is used to identify the respondent in the spreadsheet.

The `QuestionAlternative` has a property `id` which is used to identify the question alternative in the spreadsheet.

The `QuestionAlternative` has a property `name` which is the text for the alternative.

The `QuestionAlternative` has a property `position` which is the position of the alternative in the question.

The `Question` has a property `id` which is used to identify the question in the spreadsheet.

The `Question` has a property `name` which is the text for the question.

The `Question` has a property `questionCategory` which is the category of the question.

## Connect to database

Should return version number if successful.


In [5]:
from databaseconnection import DatabaseConnection

gds = DatabaseConnection().get_database_connection()
gds.version()

'2.3.2'

## Query data from database

In [6]:
# In neo4j `Respondent` has relationship `CHOSE_ALT` to `QuestionAlternative` which has relationship `CONSISTS_OF` from `Question`.

# The `Respondent` has a property `id` which is used to identify the respondent in the spreadsheet.

# The `QuestionAlternative` has a property `id` which is used to identify the question alternative in the spreadsheet.

# The `QuestionAlternative` has a property `name` which is the text for the alternative.

# The `QuestionAlternative` has a property `position` which is the position of the alternative in the question.

# The `Question` has a property `id` which is used to identify the question in the spreadsheet.

# The `Question` has a property `name` which is the text for the question.

# The `Question` has a property `questionType` which is the category of the question.

# The spreadsheet has the following sheets with the follwoing columns:
# Respondent: id, ...(one column for each Question, where the value is the pos for questionAlternative chosen) 
# QuestionAlternative: id, name, position, questionId 
# Question: id, name, questionCategory

result = gds.run_cypher("""
                MATCH (respondent:Respondent)-[:CHOSE_ALT]->(questionAlternative:QuestionAlternative)<-[:CONSISTS_OF]-(question:Question)
                RETURN respondent.id, questionAlternative.id, questionAlternative.name, questionAlternative.position, question.id, question.name, question.questionType
                ORDER BY respondent.id, questionAlternative.id, questionAlternative.position
               """)

print(result.head())

                          respondent.id                questionAlternative.id  \
0  00043e30-e387-44de-869b-ad46868697ff  04355686-af25-422e-8473-b7e5c0b20530   
1  00043e30-e387-44de-869b-ad46868697ff  04d750b1-8543-4843-ab8e-420dc2080b40   
2  00043e30-e387-44de-869b-ad46868697ff  0a1ddcf0-a35d-4b30-bad4-449aaf3d7216   
3  00043e30-e387-44de-869b-ad46868697ff  0cba82a7-d632-4db1-b323-7371568d9c88   
4  00043e30-e387-44de-869b-ad46868697ff  0e03c3b9-0b91-46dc-8f28-5c408a60c456   

   questionAlternative.name  questionAlternative.position  \
0                       Nej                             1   
1       Stämmer ganska väl                              1   
2   Mer än 1 gång i veckan                              1   
3    Stämmer ganska dåligt                              2   
4  Ungefär 1 gång i veckan                              2   

                            question.id  \
0  6a0267b8-5220-4db3-98a6-a0d5bacc5190   
1  b09df7bd-970f-4b47-be10-4289d90874dc   
2  4f356252-28f8-

In [9]:

print(result.head())

                          respondent.id                questionAlternative.id  \
0  00043e30-e387-44de-869b-ad46868697ff  04355686-af25-422e-8473-b7e5c0b20530   
1  00043e30-e387-44de-869b-ad46868697ff  04d750b1-8543-4843-ab8e-420dc2080b40   
2  00043e30-e387-44de-869b-ad46868697ff  0a1ddcf0-a35d-4b30-bad4-449aaf3d7216   
3  00043e30-e387-44de-869b-ad46868697ff  0cba82a7-d632-4db1-b323-7371568d9c88   
4  00043e30-e387-44de-869b-ad46868697ff  0e03c3b9-0b91-46dc-8f28-5c408a60c456   

   questionAlternative.name  questionAlternative.position  \
0                       Nej                             1   
1       Stämmer ganska väl                              1   
2   Mer än 1 gång i veckan                              1   
3    Stämmer ganska dåligt                              2   
4  Ungefär 1 gång i veckan                              2   

                            question.id  \
0  6a0267b8-5220-4db3-98a6-a0d5bacc5190   
1  b09df7bd-970f-4b47-be10-4289d90874dc   
2  4f356252-28f8-

## Generate spreadsheet from dataframe 

In [19]:
# In neo4j `Respondent` has relationship `CHOSE_ALT` to `QuestionAlternative` which has relationship `CONSISTS_OF` from `Question`.

# The `Respondent` has a property `id` which is used to identify the respondent in the spreadsheet.

# The `QuestionAlternative` has a property `id` which is used to identify the question alternative in the spreadsheet.

# The `QuestionAlternative` has a property `name` which is the text for the alternative.

# The `QuestionAlternative` has a property `position` which is the position of the alternative in the question.

# The `Question` has a property `id` which is used to identify the question in the spreadsheet.

# The `Question` has a property `name` which is the text for the question.

# The `Question` has a property `questionType` which is the category of the question.

# The spreadsheet has the following sheets with the follwoing columns:
# Respondent: id, ...(one column for each Question, where the value is the pos for questionAlternative chosen) 
# QuestionAlternative: id, name, position, questionId 
# Question: id, name, questionCategory

import pandas as pd

# Create Respondent dataframe
respondent_df = result.pivot_table(index='respondent.id', columns='question.name', values='questionAlternative.name', aggfunc='first').reset_index()

print(respondent_df.head())
print(respondent_df.columns)
print(respondent_df.shape)

# # Create QuestionAlternative dataframe
# question_alternative_df = result[['questionAlternative.id', 'questionAlternative.name', 'questionAlternative.position', 'question.id', 'question.name']].drop_duplicates()

# print(question_alternative_df.head())
# print(question_alternative_df.columns)
# print(question_alternative_df.shape)

# Create Question dataframe
question_df = result[['question.id', 'question.name', 'questionAlternative.id', 'questionAlternative.name', 'questionAlternative.position']].drop_duplicates()

print(question_df.head())
print(question_df.columns)
print(question_df.shape)

# Export to Excel
with pd.ExcelWriter('output.xlsx') as writer:
    respondent_df.to_excel(writer, index=False, sheet_name='Respondent')
    # question_alternative_df.to_excel(writer, index=False, sheet_name='QuestionAlternative')
    question_df.to_excel(writer, index=False, sheet_name='Question')

question.name                         respondent.id  \
0              00043e30-e387-44de-869b-ad46868697ff   
1              0007c7ae-5ea1-490a-bd5f-c86dfbe41c14   
2              0039f241-24b8-454a-96a2-ca03aa30dd97   
3              004f500c-6047-4a5b-9bd1-98e45ca7e791   
4              006260e3-052a-4521-89aa-517b9bcabe38   

question.name  I min(a) klass(er) är det bra stämning   \
0                                                  NaN   
1                                                  NaN   
2                                                  NaN   
3                                                  NaN   
4                                                  NaN   

question.name  I vår klass arbetar eleverna hårt för att lära sig mycket  \
0                                         Stämmer ganska dåligt            
1                                         Stämmer ganska dåligt            
2                                            Stämmer ganska väl            
3               