# Dates and Times with pandas.DatetimeIndex

When working with data, one task that we might have to deal with often is manipulation of dates and times. The format provided in the dataset may not always be in a format we can easily use in the various tasks required. Transforming from that format into one that can be easily used is something that forms a core part of the preparatory work needed to be performed on a dataset.

In this notebook, we shall explore the use of the Pandas method `DatetimeIndex` in transforming date data into a format we can use for further exploration.

Import Pandas and read in the data. We shall use the [kiva loans dataset](https://www.kaggle.com/kiva/data-science-for-good-kiva-crowdfunding?select=kiva_loans.csv). We shall use just a subset of the columns in this notebook.

In [1]:
import pandas as pd

df = pd.read_csv("data/kiva_loans.csv", usecols=[2,4,7,13])
df.head()

Unnamed: 0,loan_amount,sector,country,funded_time
0,300.0,Food,Pakistan,2014-01-02 10:06:32+00:00
1,575.0,Transportation,Pakistan,2014-01-02 09:17:23+00:00
2,150.0,Transportation,India,2014-01-01 16:01:36+00:00
3,200.0,Arts,Pakistan,2014-01-01 13:00:00+00:00
4,400.0,Food,Pakistan,2014-01-01 19:18:51+00:00


Say you want to parse the `posted_time` and `funded_time` columns and get the different portions of the date (second, minute, hour, day, month, year). You could use regular text processing to do this (similar to the methods in [this article](https://medium.com/analytics-vidhya/parsing-a-text-column-with-python-bf8fde6a771a)), or you could use the Pandas `DatetimeIndex` function. In this article, we shall use the `DatetimeIndex` function.

If you explore the data types of the various columns, you will find that both time columns have the `object` data type.

In [2]:
df.dtypes

loan_amount    float64
sector          object
country         object
funded_time     object
dtype: object

First we need to convert the values in those two columns into a `datetime` object. For that we use the `pandas.to_datetime` method.

In [3]:
time_cols = list(df.columns)[3:]

for i in time_cols:
    df[i] = pd.to_datetime(df[i])

When you now check it, you will see that the `posted_time` and `funded_time` columns are of type `datetime`

In [4]:
df.dtypes

loan_amount                float64
sector                      object
country                     object
funded_time    datetime64[ns, UTC]
dtype: object

We can now proceed to process these columns using the `DatetimeIndex`. Check out the [documentation](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DatetimeIndex.html) for more details.  
Before processing all the rows, let us first pick out one instance of a timestamp and use that to walk through the various methods (denoted by parenthesis `()` after the name) and attributes.

In [5]:
sample = df.loc[0,"funded_time"]
sample

Timestamp('2014-01-02 10:06:32+0000', tz='UTC')

## Dates
### Date
`date()` returns just the date portion of the time stamp.

In [6]:
print(sample.date())

2014-01-02


### Year
`year` returns the year.  
`is_year_start`tests to see if it is the first day of the year.  
`is_year_end` tests to see if it is the last day of the year.  
`is_leap_year` tests to see if it is a leap year.  

In [7]:
print(sample.year,"\n",sample.is_year_start,"\n",sample.is_year_end,"\n",sample.is_leap_year)

2014 
 False 
 False 
 False


### Quarter
`quarter` returns the quarter number of the year.  
`is_quarter_start` tests to see if it is the first day of the quarter.  
`is_quarter_end` tests to see if it is the last day of the quarter.  

In [8]:
print(sample.quarter,"\n",sample.is_quarter_start,"\n",sample.is_quarter_end)

1 
 False 
 False


### Month
`month` returns the month number.  
`month_name()` returns the name of the month.  
`is_month_start` tests to see if it is the first day of the month.  
`is_month_end` tests to see if it is the last day of the month.

In [9]:
print(sample.month,"\n",sample.month_name(),"\n",sample.is_month_start,"\n",sample.is_month_end)

1 
 January 
 False 
 False


### Day
`day` returns the date in the month.  
`day_name()` returns the name of the day.  
`dayofweek` returns the ordinal position of the day in a week. Monday is considered the first, with an ordinal position of 0.  
`dayofyear` returns the number of the day in the year.  
`daysinmonth` returns the number of days in that month.  

In [10]:
print(sample.day,"\n",sample.day_name(),"\n",sample.dayofweek,"\n",sample.dayofyear,"\n",sample.daysinmonth)

2 
 Thursday 
 3 
 2 
 31


## Times
### Time
`time()` returns just the time portion of the time stamp.

In [11]:
print(sample.time())

10:06:32


### Hour
`hour` returns the hour of the day.

In [12]:
print(sample.hour)

10


### Minute
`minute` returns the minute of the hour.

In [13]:
print(sample.minute)

6


### Second
`second` returns the second of the minute.

In [14]:
print(sample.second)

32


## Timezones
`normalize()` this converts all times to midnight. It is useful in cases where the time does not matter, and all you are interested in is the date.

In [15]:
print(sample.normalize())

2014-01-02 00:00:00+00:00


`tz` returns the timezone of the timestamp.

In [16]:
print(sample.tz)

UTC


`tz_convert` converts the timestamp into one of the specified timezone. Provide the new timezone as `Continent/City`

In [17]:
zt = "Africa/Nairobi"
print(sample.tz_convert(zt),sample.tz_convert(zt).tz)

2014-01-02 13:06:32+03:00 Africa/Nairobi


## Data Processing
Now that we have seen some of the attributes and methods of `DatetiemIndex`, let us use them in processing our data.

In [18]:
df.sample(1000)

Unnamed: 0,loan_amount,sector,country,funded_time
231158,425.0,Retail,Mozambique,2015-05-08 15:09:06+00:00
496813,275.0,Food,Philippines,2016-10-03 23:05:06+00:00
135791,250.0,Food,India,2014-11-02 13:56:06+00:00
396269,550.0,Clothing,Palestine,2016-04-18 14:18:19+00:00
492414,1000.0,Transportation,Kenya,NaT
...,...,...,...,...
46792,250.0,Retail,Philippines,2014-04-21 18:24:07+00:00
49195,1675.0,Services,Kosovo,2014-05-20 14:02:47+00:00
618286,175.0,Personal Use,Mexico,2017-04-18 17:25:05+00:00
548776,625.0,Retail,Philippines,2017-01-03 08:49:28+00:00
