Problem: Current startup founders, or wannabe startup founders don't have access to the resources necessary to successfully build their companies.
Solution: Provide vetted, high-quality mentor services to guide users in their learning paths.

---------

The purpose of this project is to take a hypothetical scenario relevant to the role of a Product Analyst, and perform procedures and analyses that would be relevant to the improvement and development of a product, and the business bottom-line.

The scenario is as follows:
- You are a Product Analyst at a company currently offeringproduct and services in the startup services industry.
- You have achieved strong growth since launching X years ago, and now have a user base of X. 
- You have noticed however, that retention rates are low. 
   - Retention rates for all learning paths drop significantly after X sessions
   - Feedback provided by customers using the app frequently cite that they don't have enough motivation to finish the learning path, and that ....
   - Users frequently request customer support, but often this is not due to technical difficulties, but inquiries about the learning path itself (i.e. i'm not sure if i'm supposed to feel anxious while doing this breathing exercise.etc.).
- You therefore brainstorm with your team for solutions, and come up with the idea to introduce personal support to users by way of seasoned entrepeneurs and startup founders. The idea is that during their learning path, users will have the option to have a mentor guide them through the process, whether that be through a weekly call, or simply an occasional exchange of messages .
- Your business has now just gone from a 1-dimensional B2C business, to a two-sided marketplace model, where you source, vet, and match users with startup mentors to guide them in their learning paths.
- In this scenario however, there is no publicly available app data since that is proprietary. Instead, we will rely on the generation of synthetic data to guide the resolution of our hypothetical scenario.
   


   

The database will be structured as follows:
- users_info
   - uid
   - age
   - gender
   - location
   - 

- learning_paths
   - Type: fundraising, product development, design, team-building.
   - Tier: bronze, silver, gold.
   
   
- users_events
   - event_id
   - session_start
   - session_end
   - mood_before
   - mood_after
   - completion
   - support_request (whether the user used the chat support)
   
   
   
   
---------

Generating user data required assuming certain parameters as to how the probability distributions of each feature will be generated.

We start with users_info:
- age. 
   - user population is evenly distributed among age groups 18-65, partly due to its focus on user interface.
- gender. 
   - Women representing only 14% of solo founders. 
   - Assume Women (14%) and Men (86%)
- location.
   - Predominantly US based.
   - Web traffic: 37% US, 9% UK, 7% India, 6% Canada, 5% Germany, 36% other.
- subscription: yes, no.
   - Assume a strong growth of 3.5% paying customer base 2 years in to founding.  

learning paths.
- type
- tier.
-


user_events
- event_id. quadratic distribution.
- session_start
   - datetime. quadratic distribution. 
      - Assumptions 1: Jan - March is the most active period for VC investments, so users will be accessing services more frequently during this period. Also an uptick at the end of the year.
      - Assumption 2: follows the 9-5 working day.
- session_end
   - datetime.quadratic distribution.
   - session range is 1 - 30 minutes.
   - the average session module is 8 minutes.
- session_rating
   - 
- completion
   - few users will complete the modules. distribution will probably be log-normal.
- support_request
   - approx. 20% of users will request support.
   
  
  
- 1000 users.
- 12 months.
- 4 types of users:
   - Inactive: 0 events/day
   - Casual: 1 event/day
   - Frequent: 2 events/day
   - Power: 3+ events/day
- Each user on average will have 2 session/day.
- Total number of events = 2000.

- Users are most active in the morning and in the evening, with a preference over the evening between 20:00 - 23:00.
- Users are most active at the start of the year and at the end of the year, with a skew towards start of the year.

In [1]:
import numpy as numpy
import pandas as pandas
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime, timedelta
%matplotlib inline

In [None]:
'''Generating unique event id.'''
event_id = random.randint(range(len(profiles_df)))

In [None]:
'''Generating user id.'''
user_id = numpy.random.normal(range(len(profiles_df)))

In [None]:
'''Generating session_start.'''

#Set the mean and stdev values for the normal distribution of the time values.
mean_time = datetime.strptime('12:00:00', '%H:%M:%S')
std_dev_time = timedelta(minutes = 30)

#Generate a sample of 10000 time values with a normal distribution.
