## Questions to pick from
1. What player characteristics and behaviours are most predictive of subscribing to a game-related newsletter, and how do these features differ between various player types?
2. We would like to know which "kinds" of players are most likely to contribute a large amount of data so that we can target those players in our recruiting efforts.
3. We are interested in demand forecasting, namely, what time windows are most likely to have large number of simultaneous players. This is because we need to ensure that the number of licenses on hand is sufficiently large to accommodate all parallel players with high probability.

#### Question: What time are players most likely to log in during the day and which month is most active?

Variables to uses: 
- start_time (separate this into month and hour)
#### How to wrangle data
- Make sure all data is from 2024
- From this data we can see which **month is the most popular** (*probably summer*),  and what **hour** has the most players
- Data involving specific players is less important

In [33]:
import pandas as pd
import altair as alt
alt.data_transformers.enable('vegafusion')

month_order = [
    'January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December'
]

sessions_raw = pd.read_csv('project_data/sessions.csv')
sessions_times = sessions_raw.drop(['hashedEmail', 'original_start_time', 'original_end_time'], axis = 1)

# organizing data 
data = {'date_column_start': sessions_times['start_time'], 
        'date_column_end': sessions_times['end_time']}

sessions_times = pd.DataFrame(data)

# convert to datetime
sessions_times['date_column_start'] = pd.to_datetime(sessions_times['date_column_start'], dayfirst = True)
sessions_times['date_column_end'] = pd.to_datetime(sessions_times['date_column_end'], dayfirst = True)

# data has been separated by year month and time in day for start data
sessions_times['start_year'] = sessions_times['date_column_start'].dt.year
sessions_times['start_month'] = sessions_times['date_column_start'].dt.month
sessions_times['start_time'] = sessions_times['date_column_start'].dt.hour

# only need start time data, but left end time hour incase we want to look at length of session
sessions_times['end_time'] = sessions_times['date_column_end'].dt.hour

# chronological order of months
sessions_times = sessions_times.sort_values('start_month')

# dropping columns
sessions_new = sessions_times.drop(['date_column_start', 'date_column_end'], axis = 1)

# all data is already from 2024!
  #year2024 = [2024]
  #sessions_wr = sessions_new[sessions_new['start_year'].isin(year2024)]

# tallying months repeated, to see player logins
sessions_log_month = (
    sessions_new['start_month']
    .value_counts()
    .reset_index()
)
sessions_log_month.columns = ['month', 'num_logins']
sessions_log_month['month'] = pd.to_datetime(sessions_log_month['month'], format = '%m').dt.month_name()

sessions_log_month_plot = alt.Chart(sessions_log_month).mark_bar(color = '#CE71D9').encode(
    x = alt.X('month', axis = alt.Axis(labelAngle = -45), sort = month_order).title('Month'),
    y = alt.Y('num_logins:Q').title('Amount of Logins')
).properties(
    width = 350
).configure_axis(
    labelFontSize = 12,
    titleFontSize = 15,
)

# tallying hours repeated, to see player logins
sessions_log_hour = (
    sessions_new['start_time']
    .value_counts()
    .reset_index()
)
sessions_log_hour.columns = ['hour', 'num_logins']

sessions_log_hour_plot = alt.Chart(sessions_log_hour).mark_bar(color = '#71BDD9').encode(
    x = alt.X('hour').title('Hour of the Day (24h clock)'),
    y = alt.Y('num_logins:Q').title('Amount of Logins'),
).properties(
    width = 350
).configure_axis(
    labelFontSize = 12,
    titleFontSize = 15,
)


In [34]:
sessions_log_hour_plot

In [35]:
sessions_log_month_plot