### Title
- Introduction:
    - provide some relevant background information on the topic so that someone unfamiliar with it will be prepared to understand the rest of your report
    - clearly state the question you tried to answer with your project
    - identify and fully describe the dataset that was used to answer the question
### Methods & Results:
- describe the methods you used to perform your analysis from beginning to end that narrates the analysis code.
- your report should include code which:
    - loads data 
    - wrangles and cleans the data to the format necessary for the planned analysis
    - performs a summary of the data set that is relevant for exploratory data analysis related to the planned analysis 
    - creates a visualization of the dataset that is relevant for exploratory data analysis related to the planned analysis
    - performs the data analysis
    - creates a visualization of the analysis 
    - note: all figures should have a figure number and a legend
Discussion:
- summarize what you found
- discuss whether this is what you expected to find?
- discuss what impact could such findings have?
- discuss what future questions could this lead to?
### References
- You may include references if necessary, as long as they all have a consistent citation style.

### Introduction

The Pacific Laboratory or Artificial Intelligence (PLAI) is a research group based at the University of British Columbia working in the Department of Computer Science. At it's core, their research focuses on artificial intellgience (1). In their most recent project, the PLAI has been collecting data from player in a Minecraft server to create an advanced AI that can respond like a real person (2). The goal is to develop learning algorithms and scaffolding for embodied (i.e., intelligent systems with a virtual or physical body with sensing, decision-making, and acting, that allows them to interact with real-world or simulated environments) AI that can be used for Minecraft NPCs and even different fields.

This group report will attempt to answer the following question with the provided datasets:

#### **Can we predict the length of each session based on the log in time and day of the week?**

The dataset we will be working with can be summarized in tables below.

#### Table 1. Players dataset
|Variable Name|Type|Description|
|---|---|---|
|experience|Categorical|Self-reported skill level of player|
|subscribe|Categorical|Whether or not the player is subscribed to the newsletter|
|hashedEmail|Categorical|Unique player identification|
|played_hours|Quantitative|Total hours of playtime|
|name|Categorical|Chosen name of player|
|gender|Categorical|Chosen gender of player|
|age|Categorical|Reported age of player|
|individualId|N/A|No information|
|organizationName|N/A|No information|

#### Table 2. Sessions dataset
|Variable Name|Type|Description|
|---|---|---|
|hashedEmail|Categorical|Unique player identification|
|start_time|Temporal|Timestamp of when player logs in (day/month/year/hour)|
|end_time|Temporal|Timestamp of when player logs off (day/month/year/hour)|
|original_start_time|Temporal|Timestamp of the exact second the player has logged on since the server started running|
|original_end_time|Temporal|Timestamp of the exact second the player has logged off since the server started running|

### Methods and Results

The important data from Table 2 is start_time and end_time, all other columns can be dropped

In [5]:
import pandas as pd
import altair as alt

# Loading the data from the internet

url = 'https://drive.google.com/uc?export=download&id=14O91N5OlVkvdGxXNJUj5jIsV5RexhzbB'
sessions = pd.read_csv(url)

# Dropping unnecessary data

sessions_times = sessions.drop(['hashedEmail', 'original_start_time', 'original_end_time'], axis = 1)

In the following code, we cleaned up the data in order to perform our analysis

In [8]:
# Converting to datetime to handle data properly

sessions_times['start_time'] = pd.to_datetime(sessions_times['start_time'], dayfirst = True)
sessions_times['end_time'] = pd.to_datetime(sessions_times['end_time'], dayfirst = True)

# Creating a new column for day of the week with Monday as 0 and Sunday as 6

sessions_days = sessions_times.assign(day_of_week = sessions_times['start_time'].dt.dayofweek)

# Calculating the duration of each login in minutes

sessions_days['duration_minutes'] = (sessions_days['end_time'] - sessions_days['start_time']) / pd.Timedelta(minutes=1)

# Calculating the average duration grouped by the day of the week

avg_duration = sessions_days.groupby('day_of_week')['duration_minutes'].mean().reset_index()

With our data wrangled, we can now make relevant visualizations

### References

1. https://plai.cs.ubc.ca/ (i'll fix this later!, does anyone have a preferred citation style?)
2. https://plaicraft.ai/

### Questions
#### Nico: Can we predict when players are most likely to log on based on skill level?
#### Jasper: When are players most likely to be active, and what traits are associated with heavier playtime?
#### Gabriel: Can a player's skill level serve as a reliable predictor of their typical login times?

#### Jonah: Can we predict the length of each session based on the log in time and day of the week?