# Formatting Data (dates)

It is very common to find dates (some combination of year, month, day of week and time) in data that is collected in real time (and other that organize event information.

Let's see a data frame that comes with dates from an API.

In [None]:
import pandas as pd
from sodapy import Socrata

client = Socrata("data.seattle.gov", None)

results = client.get("kzjm-xkqj", limit=2000)

# Convert to pandas DataFrame
calls911 = pd.DataFrame.from_records(results)

Let's check some information:

In [None]:
calls911.info()

Let's get rid of some columns:

In [None]:
calls911=calls911.iloc[:,:7]

Let's check the column _datetime_:

In [None]:
calls911.datetime.head()

In [None]:
# then
type(calls911.datetime[0])


The date and time information is not useful at this time, that is, the information it offers is of limited use, as it is just a string. 

Let's make it useful:

In [None]:
calls911.datetime=pd.to_datetime(calls911.datetime,format='%Y-%m-%d')
calls911.info()

Once you have this data type, you can retrieve important information:

In [None]:
calls911['date']=calls911.datetime.dt.date
calls911['year']=calls911.datetime.dt.year
calls911['month']=calls911.datetime.dt.month_name()
calls911['weekday']=calls911.datetime.dt.day_name()
calls911['hour']=calls911.datetime.dt.hour

In [None]:
calls911.head()

Let's create a new column with what we have. In this case, a boolean where you tell if it is night time (after 8 pm before 6 am):

In [None]:
calls911['nightTime']=((calls911['hour']<=6) | (calls911['hour']>=20))

Let's save what we have:

In [None]:
import os

where=os.path.join('DataFiles','calls911.pkl')
calls911.to_pickle(where)