# Working With Dates









---



# Setup

In [2]:
## imports
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Facebook Yahoo Stock Data


### Import the data FB.csv

Upload the file from Q-Tools `FB.csv` to your server/computer.  

> It is usually in the `Downloads` folder

> For those using `Colab`, you upload this file, which can be found on the left hand nav


In [3]:
## import the file
fb = pd.read_csv("/Users/Kyle_Staples/Documents/GitHub/IS834/datasets/FB.csv")

In [4]:
fb.head()

Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume
0,2014-03-20,68.010002,68.230003,66.82,66.970001,66.970001,44439000
1,2014-03-21,67.529999,67.919998,66.18,67.239998,67.239998,59999900
2,2014-03-24,67.190002,67.360001,63.360001,64.099998,64.099998,85696000
3,2014-03-25,64.889999,66.190002,63.779999,64.889999,64.889999,68786000
4,2014-03-26,64.739998,64.949997,60.369999,60.389999,60.389999,97503900


In [5]:
## peak
fb.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1258 entries, 0 to 1257
Data columns (total 7 columns):
Date         1258 non-null object
Open         1258 non-null float64
High         1258 non-null float64
Low          1258 non-null float64
Close        1258 non-null float64
Adj Close    1258 non-null float64
Volume       1258 non-null int64
dtypes: float64(5), int64(1), object(1)
memory usage: 68.9+ KB


> Note that Date is a type called `object` .  We haven't talked about that yet, but its basically a string.  Let's summarize the data, and proove that Date is held out of the analysis.

In [0]:
fb.describe()

#### We could parse the file into columns

In [0]:
## parse into month/day/year columns
fb.head(2)

In [0]:
## year
fb['year'] = fb['Date'].str.slice(0,4)
fb.head()


In [0]:
## month
fb['month'] = fb['Date'].str.slice(5,7)
fb.head()

In [0]:
## day -- note the     n:   syntax
fb['day'] = fb['Date'].str.slice(8)
fb.head()

## Why do this?

Now we can use these fields for summaries, or, to add a level of detail to the record that we ultimately want to summarize, like sales/month, or average sales by day, but it requires some work to summarize data on a date/time level.

We will come back to this as if we have the datetime, we can actually do this more explicitly, but a good exercise of parsing strings too.

# Parse Dates using Pandas

In [0]:
# create a new column with date parsed
fb['date2'] = pd.to_datetime(fb['Date'])

In [0]:
fb.head()

In [0]:
fb.dtypes

# Parse the date upon reading the file

In [0]:
# read in the same Facebook file
fb2 = pd.read_csv("FB.csv", parse_dates=['Date'])

In [0]:
fb2.dtypes

> Look at the help, there is a lot we can do with dates, but this is the core of it and will get you a long way most of the time.

> https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html

# Build a Date Range with `date_range`

In [0]:
N = 100
df = pd.DataFrame({'Date': pd.date_range(start="2019-01-01", periods=N, freq="D"),
                   'Value': np.random.randint(-10, 10, size=N)})

In [0]:
df.head()

# Worth Noting - define the format for a date/time in String form

We can also specify the format of the string 

http://strftime.org/

> Above is also in the cheatsheets folder for the course site

In [0]:
# bring in Facebook again, but explicitly define the format
fb3 = pd.read_csv("FB.csv")

In [0]:
fb3.dtypes

In [0]:
fb3.head(1)

In [0]:
# parse the date with explicit format
fb3['date'] = pd.to_datetime(fb3['Date'], format="%Y-%m-%d")

In [0]:
fb3.head()



---



In [0]:
# we can do this with other date formats
tmp_data = {'a':['01/01/2019'], 'b':['Tuesday Jan 1, 19'], 'c':['2019-03-20 18:30:25']}
more_parsing = pd.DataFrame(tmp_data)
more_parsing

In [0]:
# different ways to parse
more_parsing["dmy"] = pd.to_datetime(more_parsing['a'], format="%d/%m/%Y")
more_parsing

In [0]:
# mdy
more_parsing["mdy"] = pd.to_datetime(more_parsing['a'], format="%m/%d/%Y")
more_parsing

In [0]:

# more detailed parsing
more_parsing['parsed'] = pd.to_datetime(more_parsing['b'], format="%A %b %d, %y")
more_parsing

In [0]:
# finally, its not just dates, but dates and times
more_parsing['datetime'] = pd.to_datetime(more_parsing['c'], format="%Y-%m-%d %H:%M:%S")
more_parsing

In [0]:
more_parsing.dtypes

# Extract Components of the Date/time

In [0]:
# use the df dataframe
df.head()

In [0]:
# extract out the day value
df['day'] = df['Date'].dt.day
df.head()

In [0]:
# extact out the month and year
df['month'] = df['Date'].dt.month
df['year'] = df['Date'].dt.year
df.tail()

In [0]:
# even parse out times -- use more_parsing from above
more_parsing['hour'] = more_parsing['datetime'].dt.hour
more_parsing
                             

In [0]:
# to round on the parsing
more_parsing['minute'] = more_parsing['datetime'].dt.minute
more_parsing['second'] = more_parsing['datetime'].dt.second
more_parsing
                 