# Getting started with Pandas TimeSeries

This notebook is intended to introduce you to the basic Pandas DateTime    
The following five points will be covered:

1. [Parsing DateTime](#task1)
2. [Aggregating columns](#task2)
3. [Extracting DateTime properties](#task3)
4. [Fitering and Selecting specific durations](#task4)
5. [Changing the granularity of the Timeseries](#task5)


### Prepare environment and read data 

In [None]:
# Constants 
INPUT_PATH = '/kaggle/input/netflix-shows/netflix_titles.csv'

# Libraries 
import pandas as pd 
import matplotlib.pyplot as plt

# Set default properties for plotting 
plt.rcParams['figure.figsize'] = [11, 4]
plt.rcParams['figure.dpi'] = 100 

In [None]:
# Read data and display 5 random entries 
raw_df = pd.read_csv(INPUT_PATH)

_____

## Task 1: Countthe number of shows added per day     

In the following section, we will parse the raw date format into      
pandas datetime and summarize the daily shows added to the total number 

### Parse timestamp into datetime column <a id='task1'></a>

Change the raw format to a pandas datetime format.    
Once we have changed the format as such, we will be able to    
apply more functionalities illustrated below 


In [None]:
df = raw_df.copy()

In [None]:
df["date_added"] = pd.to_datetime(df["date_added"])

In [None]:
df.dtypes

### Count shows added  per date <a id='task2'></a>
All the shows have been listed in the original dataframe.     
Now let's count the total number of shows added per day

In [None]:
show_count = df.groupby("date_added")[["show_id"]].count() # with bractice [[]] the output if dataFrame

In [None]:
show_count.head()

In [None]:
show_count.index

______

## Task 2: Extract the day name and sum-up the shows added 
<a id='task3'></a>

In the last step, we have used the `date_added` column to count the number of shows.    
Since we've used the `groupby` functionality to count the number of shows,     
the column is set as our index. 

We could now use our new index directly to extract the Attributes of the timestamp.    
One example of those Attributes is the `day_name`.    
Check out the [full list of the attributes here](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DatetimeIndex.html).

In [None]:
show_count['days'] = show_count.index.day_name()

In [None]:
show_count.head()

In [None]:
show_count.groupby("days").sum()

______

## Task 3: Select data from 2016 onwards 
<a id='task4'></a>

You can also use the regular masking way to select and filter entries.      
The syntax is even simpler than one could expect. You don't even need to parse    
your filtering criteria to `datetime`. A simple string with `%YYYY-%MM-%DD` format     
will do the job  


In [None]:
show_count["show_id"].plot()

In [None]:
show_count = show_count[show_count.index > "2016" ]

______

## Task 4: Sum up weekly data 
<a id='task5'></a>

It is possible to change the granularity of your timeseries directly using Pandas datetie module.        
       
       
To do that, you need to specify two things: 
- Your new granularity passed as an argument to the `resample` function. [Read more details](https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#dateoffset-objects)
- The function that will be used to generate the new granularity

In [None]:
show_count["show_id"].resample("1w").count()