# Better date handling


In [1]:
import pandas as pd
import numpy as np

We will work on the `orders` and `order_details` dataframes.
This time, we do not use the `parse_dates` option.

In [2]:
orders = pd.read_csv("https://github.com/gdv/foundationsCS/raw/main/students/ex-data/Northwind/Orders.csv")

In [3]:
details = pd.read_csv("https://github.com/gdv/foundationsCS/raw/main/students/ex-data/Northwind/OrderDetails.csv")

Sometimes we need to specify the format of the field containing the date. We can use `to_datetime` and its `format` option (see the format [specification](https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior)).

In [4]:
orders['OrderDate'].head()

0    2012-07-04
1    2012-07-05
2    2012-07-08
3    2012-07-08
4    2012-07-09
Name: OrderDate, dtype: object

In [5]:
orders['OrderDate'].tail()

16813    2013-06-29 21:05:55
16814    2014-01-19 12:27:11
16815    2014-10-15 09:51:09
16816    2013-02-07 02:06:05
16817    2013-08-31 02:59:28
Name: OrderDate, dtype: object

In [6]:
pd.to_datetime(orders['OrderDate'], format = '%Y-%m-%d %H:%M:%S')

ValueError: time data "2012-07-04" doesn't match format "%Y-%m-%d %H:%M:%S", at position 0. You might want to try:
    - passing `format` if your strings have a consistent format;
    - passing `format='ISO8601'` if your strings are all ISO8601 but not necessarily in exactly the same format;
    - passing `format='mixed'`, and the format will be inferred for each element individually. You might want to use `dayfirst` alongside this.

In [None]:
orders['parsed_date'] = pd.to_datetime(orders['OrderDate'], format = 'mixed')

In [None]:
orders['parsed_date'].head()

Parse the column `RequiredDate`.

In [None]:
pd.to_datetime(orders['RequiredDate'], format = 'mixed')

## String/Regex manipulation

Extract the orders shipped to Europe (look at the `ShipRegion` column)

In [None]:
orders['ShipRegion'].head()

In [None]:
orders['ShipRegion'].str.contains('Europe')

In [None]:
orders['ShipRegion'].str.contains('[Ee]urope')

Build a new column with the continent Europe, in place of the regions.

In [None]:
orders['ShipRegion'].str.extract(r'([Ee]urope)')

Another way is to remove the portion of the text preceding `Europe`

In [None]:
orders['ShipRegion'].str.replace('.*\s[Ee]urope', 'Europe', regex = True)

## Zipping lists

In [None]:
lista = [1, 2, 3, 4]
listb = "abcd"

In [None]:
list(zip(lista, listb))