<img src="https://pandas.pydata.org/static/img/pandas.svg" width="250">

## <center> Strings and Time Series

In [2]:
import pandas as pd

### Strings

In [3]:
# Creating a Pandas series

names = pd.Series(['Pomeray, CODY ',' Wagner; Jarry','smith, Ray'])

In [4]:
names

0    Pomeray, CODY 
1     Wagner; Jarry
2        smith, Ray
dtype: object

In [5]:
# Replacing semicolons with comma

names = names.str.replace(';',',')
names

0    Pomeray, CODY 
1     Wagner, Jarry
2        smith, Ray
dtype: object

In [6]:
# Changing to lowecase letters

names = names.str.lower()
names

0    pomeray, cody 
1     wagner, jarry
2        smith, ray
dtype: object

In [7]:
# Creating a tuple

names = names.str.split(', ')
names

0    [pomeray, cody ]
1    [ wagner, jarry]
2        [smith, ray]
dtype: object

In [8]:
# Swapping the order of first and last name

names = pd.Series([i[::-1] for i in names])
names

0    [cody , pomeray]
1    [jarry,  wagner]
2        [ray, smith]
dtype: object

In [9]:
# Joining the strings that are separated with comma

names = [' '.join(i) for i in names] 
names

['cody  pomeray', 'jarry  wagner', 'ray smith']

### Time Series

In [10]:
# Specifying details for our sample date range

daterange = pd.period_range('1/1/2020', freq='30d', periods=4)
daterange

PeriodIndex(['2020-01-01', '2020-01-31', '2020-03-01', '2020-03-31'], dtype='period[30D]')

In [11]:
# Organizing our sample date

date_df = pd.DataFrame(data=daterange,columns=['sample date'])
date_df

Unnamed: 0,sample date
0,2020-01-01
1,2020-01-31
2,2020-03-01
3,2020-03-31


In [12]:
# Date difference using diff function

date_df['date difference'] = date_df['sample date'].diff(periods=1)
date_df

Unnamed: 0,sample date,date difference
0,2020-01-01,NaT
1,2020-01-31,<30 * Days>
2,2020-03-01,<30 * Days>
3,2020-03-31,<30 * Days>


In [14]:
# Adding a new column that contains the first date of the month

date_df['first of month'] = date_df['sample date'].values.astype('datetime64[M]')
date_df

Unnamed: 0,sample date,date difference,first of month
0,2020-01-01,NaT,2020-01-01
1,2020-01-31,<30 * Days>,2020-01-01
2,2020-03-01,<30 * Days>,2020-03-01
3,2020-03-31,<30 * Days>,2020-03-01


### Data Types

In [18]:
# Quickly checking the data types of each column

date_df.dtypes

sample date           period[30D]
date difference            object
first of month     datetime64[ns]
dtype: object

#### Converting to Datetime64 format

The **datetime64** data type is a data type in **NumPy** that allows you to store **dates and times with a defined sub-second precision**. It is called datetime64 because datetime is already taken by the Python standard library. The precision of datetime64 ranges from hours to attoseconds (10^-18).

In [20]:
# Converting data to datetime 64 timestamp

date_df['sample date'] = date_df['sample date'].dt.to_timestamp()
date_df.dtypes

sample date        datetime64[ns]
date difference            object
first of month     datetime64[ns]
dtype: object

#### Date Subtraction

In [21]:
# Date subtraction is simple as a mathematical subtraction

date_df['sample date'] - date_df['first of month']

0    0 days
1   30 days
2    0 days
3   30 days
dtype: timedelta64[ns]

In [22]:
# We could also subtract our date difference from the previous cell

date_df['sample date'] - date_df['date difference']

  date_df['sample date'] - date_df['date difference']


0          NaT
1   2020-01-01
2   2020-01-31
3   2020-03-01
dtype: datetime64[ns]

#### Timedelta Function

**Timedelta** is a class in pandas that **represents a duration or the difference between two dates or times**. It is the pandas equivalent of python’s datetime.timedelta and is interchangeable with it in most cases.

In [23]:
# Using time delta function to specify a time span you want to add or subract to your date

date_df['sample date'] - pd.Timedelta('30 d')

0   2019-12-02
1   2020-01-01
2   2020-01-31
3   2020-03-01
Name: sample date, dtype: datetime64[ns]