# Pandas : Select rows between two dates - DataFrame or CSV file

## Resources

* [pandas.to_datetime](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.to_datetime.html)
* [pandas.DataFrame.between_time](https://pandas.pydata.org/pandas-docs/version/0.23.4/generated/pandas.DataFrame.between_time.html)
* [pandas.DataFrame.loc](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.loc.html)

## Use cases

* Pandas: Verify columns containing dates
* Convert string to datetime in DataFrame
* Select rows between two dates
    * 1. Select rows based on dates with loc
    * 2. Series method between
    * 3. Select rows between two times
    * 4. Select rows based on dates without loc
    * 5. Use mask to mark the records
    * 6. Select records from last month/30 days 

## Step 1: Import Pandas and read data

In [1]:
import pandas as pd
df = pd.read_csv("../csv/data.csv")
df

Unnamed: 0,loading_datetime,pages,title,datetime_col
0,2019-10-28 19:56:03,main,<GET https://www.wikipedia.org/> (The Free En...,2019-10-29 9:06:03
1,2019-10-29 19:56:03,english,<GET https://en.wikipedia.org/wiki/Main_Page>...,2019-10-31 11:16:43
2,2019-10-29 19:56:03,italiano,<GET https://it.wikipedia.org/wiki/Pagina_pri...,2019-10-30 21:15:23
3,2019-10-30 19:56:03,português,<GET https://pt.wikipedia.org/wiki/Wikip%C3%A...,2019-10-30 20:26:35


## Step 2: Pandas: Verify columns containing dates

In [2]:
df.dtypes

loading_datetime    object
pages               object
title               object
datetime_col        object
dtype: object

In [3]:
df.datetime_col

0      2019-10-29 9:06:03
1     2019-10-31 11:16:43
2     2019-10-30 21:15:23
3     2019-10-30 20:26:35
Name: datetime_col, dtype: object

In [4]:
dateCols = ['loading_datetime']
df = pd.read_csv("../csv/data.csv", parse_dates=dateCols)

In [5]:
df.dtypes

loading_datetime    datetime64[ns]
pages                       object
title                       object
datetime_col                object
dtype: object

## Step 3: Convert string to datetime in DataFrame

In [6]:
df.datetime_col=pd.to_datetime(df.datetime_col)

In [7]:
df.dtypes

loading_datetime    datetime64[ns]
pages                       object
title                       object
datetime_col        datetime64[ns]
dtype: object

In [8]:
df.datetime_col=pd.to_datetime(df.datetime_col, utc=True)

In [9]:
df.dtypes

loading_datetime         datetime64[ns]
pages                            object
title                            object
datetime_col        datetime64[ns, UTC]
dtype: object

## Step 4: Select rows between two dates

#### 1. Select rows based on dates with loc

In [10]:
start_date = pd.to_datetime('2019-10-30 20:41', utc= True)
end_date = pd.to_datetime('5/13/2020 8:55', utc= True)

df.loc[(df['datetime_col'] > start_date) & (df['datetime_col'] < end_date)]

Unnamed: 0,loading_datetime,pages,title,datetime_col
1,2019-10-29 19:56:03,english,<GET https://en.wikipedia.org/wiki/Main_Page>...,2019-10-31 11:16:43+00:00
2,2019-10-29 19:56:03,italiano,<GET https://it.wikipedia.org/wiki/Pagina_pri...,2019-10-30 21:15:23+00:00


#### 2.  Series method between

In [None]:
start_date = pd.to_datetime('2019-10-30 20:41', utc= True)
end_date = pd.to_datetime('5/13/2020 8:55', utc= True)

df[df.datetime_col.between(start_date, end_date)]

#### 3. Select rows between two times

In [11]:
df2 = df.copy()
df2 = df2.set_index(['datetime_col'])
df2.between_time('21:10', '23:50')

Unnamed: 0_level_0,loading_datetime,pages,title
datetime_col,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2019-10-30 21:15:23+00:00,2019-10-29 19:56:03,italiano,<GET https://it.wikipedia.org/wiki/Pagina_pri...


#### 4. Select rows based on dates without loc

In [None]:
df[(df['datetime_col'] > '2018-12-02') & (df['datetime_col'] <= '2018-12-03 23:26:10+00:00')]

#### 6. Select records from last month/30 days 

In [12]:
df[df["datetime_col"] >= (pd.to_datetime('11/30/2019', utc=True) - pd.Timedelta(days=30))]

Unnamed: 0,loading_datetime,pages,title,datetime_col
1,2019-10-29 19:56:03,english,<GET https://en.wikipedia.org/wiki/Main_Page>...,2019-10-31 11:16:43+00:00
