# Formatting Data (dates)

It is very common to find dates (some combination of year, month, day of week and time) in data that is collected in real time (and other that organize event information.

Let's see a data frame that comes with dates from an API.

In [2]:
import pandas as pd
from sodapy import Socrata

client = Socrata("data.seattle.gov", None)

results = client.get("kzjm-xkqj", limit=2000)

# Convert to pandas DataFrame
calls911 = pd.DataFrame.from_records(results)



Let's check some information:

In [3]:
calls911.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2000 entries, 0 to 1999
Data columns (total 12 columns):
 #   Column                       Non-Null Count  Dtype 
---  ------                       --------------  ----- 
 0   address                      2000 non-null   object
 1   type                         2000 non-null   object
 2   datetime                     2000 non-null   object
 3   latitude                     1996 non-null   object
 4   longitude                    1996 non-null   object
 5   report_location              1997 non-null   object
 6   incident_number              2000 non-null   object
 7   :@computed_region_ru88_fbhk  1992 non-null   object
 8   :@computed_region_kuhn_3gp2  1992 non-null   object
 9   :@computed_region_q256_3sug  1997 non-null   object
 10  :@computed_region_2day_rhn5  176 non-null    object
 11  :@computed_region_cyqu_gs94  165 non-null    object
dtypes: object(12)
memory usage: 187.6+ KB


Let's get rid of some columns:

In [4]:
calls911=calls911.iloc[:,:7]

Let's check the column _datetime_:

In [5]:
calls911.datetime.head()

0    2022-08-18T10:15:00.000
1    2022-08-18T10:08:00.000
2    2022-08-18T10:07:00.000
3    2022-08-18T09:11:00.000
4    2022-08-18T09:06:00.000
Name: datetime, dtype: object

In [6]:
# then
type(calls911.datetime[0])


str

The date and time information is not useful at this time, that is, the information it offers is of limited use, as it is just a string. 

Let's make it useful:

In [7]:
calls911.datetime=pd.to_datetime(calls911.datetime,format='%Y-%m-%d')
calls911.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2000 entries, 0 to 1999
Data columns (total 7 columns):
 #   Column           Non-Null Count  Dtype         
---  ------           --------------  -----         
 0   address          2000 non-null   object        
 1   type             2000 non-null   object        
 2   datetime         2000 non-null   datetime64[ns]
 3   latitude         1996 non-null   object        
 4   longitude        1996 non-null   object        
 5   report_location  1997 non-null   object        
 6   incident_number  2000 non-null   object        
dtypes: datetime64[ns](1), object(6)
memory usage: 109.5+ KB


Once you have this data type, you can retrieve important information:

In [8]:
calls911['date']=calls911.datetime.dt.date
calls911['year']=calls911.datetime.dt.year
calls911['month']=calls911.datetime.dt.month_name()
calls911['weekday']=calls911.datetime.dt.day_name()
calls911['hour']=calls911.datetime.dt.hour

In [9]:
calls911.head()

Unnamed: 0,address,type,datetime,latitude,longitude,report_location,incident_number,date,year,month,weekday,hour
0,4th Ave N / Mercer St,Triaged Incident,2022-08-18 10:15:00,47.624564,-122.348877,"{'type': 'Point', 'coordinates': [-122.348877,...",F220099503,2022-08-18,2022,August,Thursday,10
1,815 S Dearborn St,Investigate Out Of Service,2022-08-18 10:08:00,47.595831,-122.322292,"{'type': 'Point', 'coordinates': [-122.322292,...",F220099502,2022-08-18,2022,August,Thursday,10
2,9401 Myers Way S,Triaged Incident,2022-08-18 10:07:00,47.518658,-122.333265,"{'type': 'Point', 'coordinates': [-122.333265,...",F220099501,2022-08-18,2022,August,Thursday,10
3,11030 5th Ave Ne,Auto Fire Alarm,2022-08-18 09:11:00,47.709488,-122.323301,"{'type': 'Point', 'coordinates': [-122.323301,...",F220099179,2022-08-18,2022,August,Thursday,9
4,3013 Harvard Ave E,MVI - Motor Vehicle Incident,2022-08-18 09:06:00,47.647935,-122.322101,"{'type': 'Point', 'coordinates': [-122.322101,...",F220099178,2022-08-18,2022,August,Thursday,9


Let's create a new column with what we have. In this case, a boolean where you tell if it is night time (after 8 pm before 6 am):

In [10]:
calls911['nightTime']=((calls911['hour']<=6) | (calls911['hour']>=20))

Let's save what we have:

In [11]:
calls911

Unnamed: 0,address,type,datetime,latitude,longitude,report_location,incident_number,date,year,month,weekday,hour,nightTime
0,4th Ave N / Mercer St,Triaged Incident,2022-08-18 10:15:00,47.624564,-122.348877,"{'type': 'Point', 'coordinates': [-122.348877,...",F220099503,2022-08-18,2022,August,Thursday,10,False
1,815 S Dearborn St,Investigate Out Of Service,2022-08-18 10:08:00,47.595831,-122.322292,"{'type': 'Point', 'coordinates': [-122.322292,...",F220099502,2022-08-18,2022,August,Thursday,10,False
2,9401 Myers Way S,Triaged Incident,2022-08-18 10:07:00,47.518658,-122.333265,"{'type': 'Point', 'coordinates': [-122.333265,...",F220099501,2022-08-18,2022,August,Thursday,10,False
3,11030 5th Ave Ne,Auto Fire Alarm,2022-08-18 09:11:00,47.709488,-122.323301,"{'type': 'Point', 'coordinates': [-122.323301,...",F220099179,2022-08-18,2022,August,Thursday,9,False
4,3013 Harvard Ave E,MVI - Motor Vehicle Incident,2022-08-18 09:06:00,47.647935,-122.322101,"{'type': 'Point', 'coordinates': [-122.322101,...",F220099178,2022-08-18,2022,August,Thursday,9,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...
1995,1236 S King St,Aid Response,2022-08-12 12:07:00,47.598337,-122.316533,"{'type': 'Point', 'coordinates': [-122.316533,...",F220096444,2022-08-12,2022,August,Friday,12,False
1996,6312 California Ave Sw,Medic Response,2022-08-12 12:02:00,47.546604,-122.387196,"{'type': 'Point', 'coordinates': [-122.387196,...",F220096443,2022-08-12,2022,August,Friday,12,False
1997,1023 E Alder St,Auto Fire Alarm,2022-08-12 12:00:00,47.60436,-122.319104,"{'type': 'Point', 'coordinates': [-122.319104,...",F220096442,2022-08-12,2022,August,Friday,12,False
1998,1401 2nd Ave,Aid Response,2022-08-12 11:58:00,47.608292,-122.337995,"{'type': 'Point', 'coordinates': [-122.337995,...",F220096441,2022-08-12,2022,August,Friday,11,False


In [12]:
import os

where=os.path.join('DataFiles','calls911.pkl')
calls911.to_pickle(where)