# Normality index

## Connect to the database

In [None]:
from databaseconnection import DatabaseConnection
gds = DatabaseConnection().get_database_connection()
gds.version()

## Create relationship between `QuestionAlternative` and `Respondent`

Create relationship from `QuestionAlternative` to respondent called `CHOSEN_BY` if the there already is a relationship between the two nodes called `CHOSE`. 

The relationship `CHOSEN_BY` will have a property called `score` which is the same as the property `nrOfRespondents` on the node `QuestionAlternative`.


In [None]:
gds.run_cypher("""
                MATCH (q:QuestionAlternative)<-[c:CHOSE]-(r:Respondent)
                MERGE (q)-[cb:CHOSEN_BY]->(r)
                SET cb.score = q.nrOfRespondents
               """)

## Create a relation between `Respondent` and `Survey`

Create a relationship from `Respondent` to `Survey` called `HAS_PARTICIPATED` if the respondent has answered at least one question in the survey.

In [None]:
gds.run_cypher("""
                MATCH (su:Survey)-[:HAS_QUESTION]->(qu:Question)-[:CONSISTS_OF]->(qa:QuestionAlternative)<-[:CHOSE]-(re:Respondent)
                MERGE (re)-[:HAS_PARTICIPATED]->(su);
               """)

## Create property in `QuestionAlternative` called `percentageOfRespondents`

Create a property in `QuestionAlternative` called `percentageOfRespondents` which is the percentage of respondents that have chosen the alternative based on how many in total that participated in the survey.

In [None]:
gds.run_cypher("""
                MATCH (su:Survey)-[:HAS_QUESTION]->(qu:Question)-[:CONSISTS_OF]->(qa:QuestionAlternative)
                WITH su, qa, qa.nrOfRespondents as nrOfRespondents
                SET qa.percentageOfRespondents = nrOfRespondents * 1.0 / su.totalParticipants
                RETURN qa.name as questionAlternativeName, qa.percentageOfRespondents as percentageOfRespondents, su.name as surveyName;
                """)

## Create a normality index.

Create a normality index for each respondent in the data set. The normality index is calculated by the following formula:

$$
\frac{\sum_{i=1}^{n} \frac{1}{\text{nrOfAlternatives}_i} \cdot \text{score}_i}{\text{nrOfQuestions}}
$$

where $n$ is the number of questions in the survey, $\text{nrOfAlternatives}_i$ is the number of alternatives in question $i$, $\text{score}_i$ is the score of the chosen alternative in question $i$ and $\text{nrOfQuestions}$ is the number of questions in the survey.

In [None]:
gds.run_cypher("""
                MATCH (re:Respondent)-[:CHOSE]->(qa:QuestionAlternative)
                WITH re, AVG(qa.percentageOfRespondents) as normalityIndex
                SET re.normalityIndex = normalityIndex
                RETURN re.id as respondentId, re.normalityIndex as normalityIndex
                ORDER BY normalityIndex;
               """)