# 3.5 Date Formatting

Working with dates in Pandas can be slightly tricky at first, but is often essential in data analysis. Pandas often interprets date fields as strings, but by assigning the column a data type of *datetime*, they obtain access to additional methods that can improve analysis. For example, the analyst can extract the month number or number of days since a date, which can show the change of data over time.

### About the data
Since the *Titanic* data set does not contain date fields, it is not used in this notebook. Instead, this notebook contains data showing earthquake occurenecs in Greece.

In [2]:
import pandas as pd
df = pd.read_csv("./data/earthquakes.csv")

Notice that there is a field "DATETIME" in this dataset. However, Pandas does not recognize this column as a datetime column but instead thinks that it's an `object` (string).

In [3]:
df.head()

Unnamed: 0,DATETIME,LAT,LONG,DEPTH,MAGNITUDE
0,1/7/1965 10:22,36.5,26.5,10,5.3
1,1/10/1965 8:02,39.25,22.25,10,4.9
2,1/12/1965 17:26,37.0,22.0,10,4.0
3,1/15/1965 14:56,36.75,21.75,10,4.5
4,3/9/1965 19:16,39.0,24.0,10,4.2


In [4]:
df.dtypes

DATETIME      object
LAT          float64
LONG         float64
DEPTH          int64
MAGNITUDE    float64
dtype: object

### Casting a column to a datetime type
We can use the `to_datetime()` function to cast all of the values in a column to a datetime type. However, Pandas needs to know how each of the numbers in the column correspond to date parts. In other words, is it "day/month/year" or "month/day/year"?

The `to_datetime()` funtion is a **Pandas function** (not a dataframe method) that can accept a Series object.

We can pass in a [Python format code](https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior) to tell Pandas how to interpret the date. The formatting codes aren't something you need to memorize, but you should keep them handy for future reference. Each code represents a part of the date. For example, `%B` would indicate a full month name (ie. January) whereas `%Y` indicates a full year (ie. 2023).

In [5]:
df['DATETIME'] = pd.to_datetime(df['DATETIME'], format="%m/%d/%Y %H:%M")

In [6]:
df.dtypes

DATETIME     datetime64[ns]
LAT                 float64
LONG                float64
DEPTH                 int64
MAGNITUDE           float64
dtype: object

In [7]:
df['DATETIME']

0        1965-01-07 10:22:00
1        1965-01-10 08:02:00
2        1965-01-12 17:26:00
3        1965-01-15 14:56:00
4        1965-03-09 19:16:00
                 ...        
251258   2021-12-31 22:55:00
251259   2021-12-31 23:03:00
251260   2021-12-31 23:31:00
251261   2021-12-31 23:36:00
251262   2021-12-31 23:36:00
Name: DATETIME, Length: 251263, dtype: datetime64[ns]

### Using datetime methods