## Data Analysis for Trees.app

This Notebook provides interactive data analysis of how the Trees application stores data about users and content used and created in the name of providing student's and people with the resources they need to be successful.

#### Understanding the Questionnaire

Big Five Personality Traits:

1. Openness to Experience: High openness (curious) vs. Low openness (practical)
2. Conscientiousness: High (organized) vs. Low (spontaneous)
3. Extraversion: High vs. Low (Introversion)
4. Agreeableness: High (compassionate) vs. Low (competitive)
5. Neuroticism: High (emotionally sensitive) vs. Low (emotionally stable)

If you view each of these traits on a binary scale, there are 2^5 = 32 "general" user 
profiles that these five traits represent:

- practical, organized, introverted, competitive, stable
- practical, organized, extroverted, competitive, stable
- curious, organized, extroverted, compassionate, sensitive
- curious, spontaneous, introverted, compassionate, sensitive
- ...

These traits aren't binary, and the slider questions provide 11 options for answers that
fall on a scale between these extremes. On this 11 point scale we really have,
161,051 unique combinations.

However, simplifying to 32 makes it easier to develop some baseline recommendations for
each generalized personality. By creating an object for each of the 32 "user-types" we
can (1) aim to predict the question answers that correlate with the "user-type".

Additionally, we can process content to see which of the 32 user-types the content is
most aligned with.

Then we can pull from this baseline to generate recommendations for content that falls
within a specific user type. 

In [None]:
# Objects that can be made for user's based on data in the trees database schema

class UserAnswersClass:
    def __init__(self, user_id):
        self.user_id = user_id
        self.answers = {}  # Dictionary with question_id as key and answer details as value

    def add_answer(self, question_id, answer, reactions_ms, timestamp):
        self.answers[question_id] = {
            "answer": answer,
            "reactions_ms": reactions_ms,
            "timestamp": timestamp
        }


class QuestionContentClass:
    def __init__(self, question_id, question_text, order=None):
        self.question_id = question_id
        self.question_text = question_text
        self.order = order


class ContentUserInteractionClass:
    def __init__(self, content_id):
        self.content_id = content_id
        self.user_interactions = []
        self.user_ratings = []
        self.user_step_feedback = []

    def add_interaction(self, interaction):
        self.user_interactions.append(interaction)

    def add_rating(self, score):
        self.user_ratings.append(score)

    def add_step_feedback(self, feedback):
        self.user_step_feedback.append(feedback)

# The idea behind these three classes is that different use cases can be clearly defined within a 
# combination of three objects of these classes that can be passed into an openai prompt that returns
# a personalized output based on the inputted use case:

# Ex:
# extrovert_wanting_better_grades = new UserAnswersClass(); extrovert_wanting_better_grades.add_answer(questions relating to that)

# extrovert_questions = new QuestionContentClass();
# education_questions = new QuestionContentClass();
# habit_questions = new QuestionContentClass(); or, school_habits_qs = new QuestionContentClass()

# hypothesis_user_interactions = ContentUserInteractionClass(); # use human experience to hypothesize what these might be, i.e which
# plans this type of user would interact with more than others

# predicted_user_interactions = ContentUserInteractionClass(); # use GPT to create a class, see what the AI would predict

# actual_user_interactions = ContentUserInteractionClass(); # with enough data, the actual is able to be used to supervised training
# of a recommendation engine that predicts content interactions for a user with this use case


In [1]:
admin_email = 'admin@trees.app'
admin_password = 'amazatic@123'

api_path = 'http://api-dev.trees.app/'

In [2]:
import json

def pretty_print_response(response):
    """
    Takes a response object and prints its JSON content in a readable format.
    
    Args:
    - response: A response object, like one from the requests library.
    """
    # Load JSON content from response
    content = response.json()
    
    # Pretty print the content
    print(json.dumps(content, indent=4, sort_keys=False))

# Example usage:
# response = requests.get('https://api.example.com/data')
# pretty_print_response(response)


Get the Access & Refresh Tokens for API access with admin credentials.

In [3]:
import requests

url = api_path + "account/token/"

headers = {
    "accept": "application/json",
    "Content-Type": "application/json",
    "X-CSRFToken": "o9GfnnIqyB6rkOOMJYBvvLLtWWI1Y0bKxVbg5t32eUsTvccpDo2HZxV1B9qWXMNc"
}

data = {
    "email": admin_email,
    "password": admin_password  # Never hardcode or expose passwords like this.
}

response = requests.post(url, headers=headers, json=data)

# To see the response:
print(f'Status: {response.status_code}')
pretty_print_response(response)
refresh_token = response.json()["refresh"]
token = response.json()["access"]
print(f'\nAccess Token: {token}')

Status: 200
{
    "refresh": "eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJ0b2tlbl90eXBlIjoicmVmcmVzaCIsImV4cCI6MTY5ODM0OTI1MCwianRpIjoiZjU2NTA5YWI1Njg5NGZjNGIzYWFlOWE5MzYxMzU3ODEiLCJ1c2VyX2lkIjoxLCJpc19zdXBlcnVzZXIiOnRydWUsImlzX29yZ2FuaXphdGlvbl9hZG1pbiI6ZmFsc2V9.GW1-vSKGsfHLtZEhwjO1kcxpbqDC8lrgEdAyoGF3gaA",
    "access": "eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJ0b2tlbl90eXBlIjoiYWNjZXNzIiwiZXhwIjoxNjk3NTcxNjUwLCJqdGkiOiI3NDQ1YTM2NThiZDc0ZDk1YTZkYzk1Yjg3ZDVjNjEwNSIsInVzZXJfaWQiOjEsImlzX3N1cGVydXNlciI6dHJ1ZSwiaXNfb3JnYW5pemF0aW9uX2FkbWluIjpmYWxzZX0.lJHgjEcQ0f6kb3rfWdCSSKl1BNnhFmte6vhozhB4464"
}

Access Token: eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJ0b2tlbl90eXBlIjoiYWNjZXNzIiwiZXhwIjoxNjk3NTcxNjUwLCJqdGkiOiI3NDQ1YTM2NThiZDc0ZDk1YTZkYzk1Yjg3ZDVjNjEwNSIsInVzZXJfaWQiOjEsImlzX3N1cGVydXNlciI6dHJ1ZSwiaXNfb3JnYW5pemF0aW9uX2FkbWluIjpmYWxzZX0.lJHgjEcQ0f6kb3rfWdCSSKl1BNnhFmte6vhozhB4464


In [4]:
url = api_path + "account/token/refresh/"

headers = {
    "accept": "application/json",
    "Content-Type": "application/json",
    "X-CSRFToken": "o9GfnnIqyB6rkOOMJYBvvLLtWWI1Y0bKxVbg5t32eUsTvccpDo2HZxV1B9qWXMNc"
}

data = {
    "refresh": refresh_token
}

response = requests.post(url, headers=headers, json=data)

# To see the response:
print(f'Status: {response.status_code}')
pretty_print_response(response)
token = response.json()["access"]
print(f'\nAccess Token: {token}')

Status: 200
{
    "access": "eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJ0b2tlbl90eXBlIjoiYWNjZXNzIiwiZXhwIjoxNjk3NTcxNjUyLCJqdGkiOiI4MTU0MTQ2YWY0MzA0MTU3ODUxZDA0ZDI4NjIwYTNmZCIsInVzZXJfaWQiOjEsImlzX3N1cGVydXNlciI6dHJ1ZSwiaXNfb3JnYW5pemF0aW9uX2FkbWluIjpmYWxzZX0.P4BqgVvosGTVvCsvfUsv06fXRPF0Qyub3cbfMM0DXlY"
}

Access Token: eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJ0b2tlbl90eXBlIjoiYWNjZXNzIiwiZXhwIjoxNjk3NTcxNjUyLCJqdGkiOiI4MTU0MTQ2YWY0MzA0MTU3ODUxZDA0ZDI4NjIwYTNmZCIsInVzZXJfaWQiOjEsImlzX3N1cGVydXNlciI6dHJ1ZSwiaXNfb3JnYW5pemF0aW9uX2FkbWluIjpmYWxzZX0.P4BqgVvosGTVvCsvfUsv06fXRPF0Qyub3cbfMM0DXlY


Let's get the account information as it is returned by api.trees.app/account/profile

In [5]:
url = api_path + "account/profile/"

headers = {
    "accept": "application/json",
    "Authorization": f"Bearer {token}",
    "X-CSRFToken": "o9GfnnIqyB6rkOOMJYBvvLLtWWI1Y0bKxVbg5t32eUsTvccpDo2HZxV1B9qWXMNc"
}

response = requests.get(url, headers=headers)

print(f'Status: {response.status_code}')
pretty_print_response(response)

Status: 200
{
    "username": "admin",
    "email": "admin@trees.app",
    "profile_image": null,
    "name": "",
    "description": "Wo",
    "location": "wo",
    "password": "pbkdf2_sha256$216000$aT9JTy9MWJ3O$5K/LExZnVJp1d9FVM00rcC3WLuubFmcRZqB7tvkA0hI=",
    "email_verified": true,
    "university": 2,
    "id": 1,
    "is_organization_admin": false
}


Now let's get all the user's on the app from a super admin endpoint.

In [6]:

url = api_path + "super-admin/login/"

headers = {
    "accept": "application/json",
    "Authorization": f"Bearer {token}",
    "Content-Type": "application/json",
    "X-CSRFToken": "o9GfnnIqyB6rkOOMJYBvvLLtWWI1Y0bKxVbg5t32eUsTvccpDo2HZxV1B9qWXMNc"
}

data = {
    "email": admin_email,
    "password": admin_password
}

response = requests.post(url, headers=headers, json=data)

status = response.status_code

print(f'Status: {status}')

if status != 200:
    print(f'Error: {response.text}')

pretty_print_response(response)

sap_token = response.json()['token']
sap_refresh_token = response.json()['refresh']

Status: 200
{
    "id": 1,
    "password": "pbkdf2_sha256$216000$aT9JTy9MWJ3O$5K/LExZnVJp1d9FVM00rcC3WLuubFmcRZqB7tvkA0hI=",
    "last_login": "2023-10-16T19:40:52.640733Z",
    "is_superuser": true,
    "username": "admin",
    "first_name": "",
    "last_name": "",
    "is_staff": true,
    "is_active": true,
    "date_joined": "2021-01-18T16:34:26Z",
    "profile_image": null,
    "email": "admin@trees.app",
    "is_organization_admin": false,
    "email_verified": true,
    "verify_email_sent": false,
    "name": "",
    "description": "Wo",
    "location": "wo",
    "university": 2,
    "groups": [],
    "user_permissions": [],
    "token": "eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJ0b2tlbl90eXBlIjoiYWNjZXNzIiwiZXhwIjoxNjk3NTcxNjU4LCJqdGkiOiJhMDc1NDc1OWJiZDk0YWVlYWQyMzVhMTc4OTkyM2I5NiIsInVzZXJfaWQiOjF9.gJzuRg83eDTgqq9cTPNC3zVJ3-W5CexiBnJNdxvu2Fc",
    "refresh": "eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJ0b2tlbl90eXBlIjoicmVmcmVzaCIsImV4cCI6MTY5ODM0OTI1OCwianRpIjoiNTk4OTYyYTYwY2NhNDc0

In [7]:
url = api_path + "super-admin/path/"

headers = {
    "accept": "application/json",
    "Authorization": f"Bearer {sap_token}",
    "X-CSRFToken": "o9GfnnIqyB6rkOOMJYBvvLLtWWI1Y0bKxVbg5t32eUsTvccpDo2HZxV1B9qWXMNc"
}

response = requests.get(url, headers=headers)
status = response.status_code

print(f'Status: {status}')

if status != 200:
    print(f'Error: {response.text}')

paths = response
pretty_print_response(paths)



Status: 200
{
    "count": 565,
    "next": 2,
    "previous": null,
    "results": [
        {
            "id": 10,
            "details": [
                {
                    "id": 17,
                    "language": "en",
                    "title": "Staying energized without caffeinee30",
                    "description": "<p>I know, it seems like everyone drinks coffee. Maybe I'm an oddball, who knows. Personally, working at a desk can get monotonous and repetitive, even if I have many different tasks to work on. Since remote work began, I've been looking for a way to stay engaged with my projects minus the caffeine. Here is what has been working for me.</p>\n",
                    "path": 10
                }
            ],
            "creator": null,
            "university": {
                "id": 4,
                "name": "Not Available GH",
                "icon": "https://storage.googleapis.com/trees-userdata/university_icons/colorful-neon-computer-keyboards-wallpap

In [53]:
import pandas as pd

# Assuming the JSON data you provided is stored in a variable called data
data = paths.json()

rows = []

# Extracting required data from each JSON object
for result in data['results']:
    plan_id = result['id']
    title = result['details'][0]['title']
    description = result['details'][0]['description']
    users_count = result['users_count']
    average_rating = result['average_rating']
    state = result['state']
    
    for step in result['steps']:
        step_name = step['step_details'][0]['step_name']
        step_description = step['step_details'][0]['description']

        # Constructing a dictionary and appending to the rows list
        rows.append({
            'plan_id': plan_id,
            'title': title,
            'description': description,
            'step_name': step_name,
            'step_description': step_description,
            'users_count': users_count,
            'average_rating': average_rating,
            'state': state
        })

# Creating a pandas DataFrame from the list of dictionaries
df = pd.DataFrame(rows)

display(df)

Unnamed: 0,plan_id,title,description,step_name,step_description,users_count,average_rating,state
0,10,Staying energized without caffeine,"I know, it seems like everyone drinks coffee. ...",Quick naps,"20 minutes or less, usually. For me, this is l...",34,4.0,approved
1,10,Staying energized without caffeine,"I know, it seems like everyone drinks coffee. ...",Moving around,Every hour or so I get up and move around a bi...,34,4.0,approved
2,10,Staying energized without caffeine,"I know, it seems like everyone drinks coffee. ...",Cold water,"You probably already know this, but most peopl...",34,4.0,approved
3,15,COVID testing in Halifax - I do not have symptoms,If you don't have symptoms of COVID 19 but wou...,Follow these instructions,Walk-in COVID testing is available at Zatzman ...,5,4.5,deactivated
4,16,COVID Testing in Halifax - I have symptoms,Nova Scotia offers online screening to determi...,Visit this website and complete the self-asses...,Individuals having symptoms will be securely c...,1,,deactivated
...,...,...,...,...,...,...,...,...
423,181,Four mental and physical health tips,Students attending college right now are facin...,Look for opportunities to serve,So many people have so many needs right now. T...,3,5.0,approved
424,182,Research using Patrick Power Library,<p>The library provides tons of valuable schol...,Using Novanet,"<p>When using Novanet, or another database, fo...",2,,moderation
425,182,Research using Patrick Power Library,<p>The library provides tons of valuable schol...,Research subject guides,"<p>For example, when you navigate to the accou...",2,,moderation
426,182,Research using Patrick Power Library,<p>The library provides tons of valuable schol...,Asking for help,<p>Email a librarian for assistance with your ...,2,,moderation


In [55]:
import pandas as pd

# Assuming the JSON data you provided is stored in a variable called data
data = paths.json()

rows = []

# Extracting required data from each JSON object
for result in data['results']:
    row = {}
    plan_id = result['id']
    title = result['details'][0]['title']
    description = result['details'][0]['description']
    users_count = result['users_count']
    average_rating = result['average_rating']
    state = result['state']

    row['plan_id'] = plan_id
    row['title'] = title
    row['description'] = description
    row['users_count'] = users_count
    row['average_rating'] = average_rating
    row['state'] = state
    
    # Populating each step's data into the row dictionary
    for idx, step in enumerate(result['steps']):
        step_name_key = f'step_name_{idx+1}'
        step_description_key = f'step_description_{idx+1}'
        row[step_name_key] = step['step_details'][0]['step_name']
        row[step_description_key] = step['step_details'][0]['description']

    rows.append(row)

# Creating a pandas DataFrame from the list of dictionaries
df = pd.DataFrame(rows)

display(df)


Unnamed: 0,plan_id,title,description,users_count,average_rating,state,step_name_1,step_description_1,step_name_2,step_description_2,...,step_name_8,step_description_8,step_name_9,step_description_9,step_name_10,step_description_10,step_name_11,step_description_11,step_name_12,step_description_12
0,10,Staying energized without caffeine,"I know, it seems like everyone drinks coffee. ...",34,4.000000,approved,Quick naps,"20 minutes or less, usually. For me, this is l...",Moving around,Every hour or so I get up and move around a bi...,...,,,,,,,,,,
1,15,COVID testing in Halifax - I do not have symptoms,If you don't have symptoms of COVID 19 but wou...,5,4.500000,deactivated,Follow these instructions,Walk-in COVID testing is available at Zatzman ...,,,...,,,,,,,,,,
2,16,COVID Testing in Halifax - I have symptoms,Nova Scotia offers online screening to determi...,1,,deactivated,Visit this website and complete the self-asses...,Individuals having symptoms will be securely c...,,,...,,,,,,,,,,
3,18,Halifax COVID testing and information,"Need to get tested, feeling symptoms or just w...",4,5.000000,deactivated,See current restrictions and guidelines,https://novascotia.ca/coronavirus/,Testing information and scheduling,http://www.nshealth.ca/coronavirustesting,...,,,,,,,,,,
4,19,Use Access-A-Bus for those with disabilities,"<p>Access-A-Bus is a shared ride, door-to-door...",2,,moderation,How to register for Access-A-Bus,<p>You must apply for access-a-bus before beco...,How to request Access-A-Bus service,<p>Call 902-490-6999 to request Access-A-Bus s...,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
95,177,How to maintain mental and physical health in ...,This plan details tips for first-year students...,4,4.500000,moderation,Prioritize your sleep,Seemingly every day I advise students to sleep...,Exercise,Exercise can help with your sleep and also imp...,...,,,,,,,,,,
96,178,Tips to improve sleep,Most college students struggle with healthy sl...,6,5.000000,approved,Avoid caffeine and alcohol,"For many students, caffeine interferes with th...",Fine-tune your sleeping environment,"Noise, light, excessive heat or cold, drafts, ...",...,"Exercise regularly, three times or more per week",Studies confirm that people in good physical c...,,,,,,,,
97,179,Tips for getting good sleep,Check out these quick tips to help improve the...,13,4.666667,approved,Avoid alcohol and nicotine close to bedtime,Substances interfere with deep sleep.\r\nAltho...,Eat 2-3 hours before your planned bedtime,Avoid heartburn and other discomforts.\r\n\r\n...,...,Avoid caffeine before bed,Caffeine is a stimulant.\r\n\r\nThis means it ...,Lie down to go to sleep only when you feel sleepy,Don't ruin your sleep-friendly environment.\r\...,Create a sleep-friendly environment,"A sleep-friendly environment is dark, cool, qu...",,,,
98,181,Four mental and physical health tips,Students attending college right now are facin...,3,5.000000,approved,Get moving regularly,"With remote learning and physical distancing, ...",Stay connected,Isolation can be difficult for everyone—now is...,...,,,,,,,,,,


### Current Problems:

1. Not getting all the plans from the /super-admin/path/ endpoint, which I need because then I can easily start to work on building dataframes to do clustering with the paths.
2. Unable to get users from /super-admin/users/ at all, this is a **major** blocker currently.

### Next Steps:
1. Try different queries in the request to get ALL the plans from the /super-admin/path/ endpoint.
2. Write a series of functions to loop through each user and get their question answers, then store them in a dataframe.
3. Write a series of functions to loop through each user and get their path interaction data, then store them in a dataframe.

In [47]:
## Running this cell seems to cause an error in the server, specifically in trees_api/web_admin/views/auth.py line 24 where it reutnrs the Response(serializer.data):
# 19. def login(self, request):
# 20.     """ used to login the admin users """
# 21.     serializer = self.get_serializer_class()(data=request.data)
# 22.     serializer.is_valid(raise_exception=True)
# 23.     serializer.save()
# 24.     return Response(serializer.data)



url = "https://api.trees.app/super-admin/users/"

headers = {
    "accept": "application/json",
    "Authorization": f"Bearer {sap_token}",
    "X-CSRFToken": "o9GfnnIqyB6rkOOMJYBvvLLtWWI1Y0bKxVbg5t32eUsTvccpDo2HZxV1B9qWXMNc"
}

response = requests.get(url, headers=headers)
status = response.status_code

print(f'Status: {status}')

if status != 200:
    print(f'Error: {response.text}')

pretty_print_response(response)

Status: 502
Error: <html>
<head><title>502 Bad Gateway</title></head>
<body bgcolor="white">
<center><h1>502 Bad Gateway</h1></center>
<hr><center>nginx</center>
</body>
</html>



JSONDecodeError: Expecting value: line 1 column 1 (char 0)

In [18]:
class User:
    def __init__(self, username, email):
        self.username = username
        self.email = email
        self.answers = {}  # Dictionary to store question-answer pairs
        self.plan_ratings = {}  # Dictionary to store plan-rating pairs
    
    def add_answer(self, question, answer):
        self.answers[question] = answer
    
    def add_rating(self, plan_name, rating):
        self.plan_ratings[plan_name] = rating
    
    def to_dict(self):
        return {
            'username': self.username,
            'email': self.email,
            'answers': self.answers,
            'plan_ratings': self.plan_ratings
        }

In [None]:
users = {}
users[response.json()["id"]] = User(response.json()["username"], response.json()["email"])
