In [3]:
import numpy as np
import pandas as pd

### This notebook contains:
1. Date Functionality
2. Time Delta
3. Categorical data

#### Date Functionality

#### Time Series:
- Pandas provide a robust tool for working time with Time series data, especially in the financial sector. 
- While working with time series data, we frequently come across the following −
    - Generating sequence of time
    - Convert the time series to different frequencies


In [4]:
pd.datetime.now()

  pd.datetime.now()


datetime.datetime(2020, 11, 30, 21, 57, 30, 58243)

#### Create a TimeStamp
- Time-stamped data is the most basic type of timeseries data that associates values with points in time.

In [6]:
pd.Timestamp('2020-11-30')

Timestamp('2020-11-30 00:00:00')

In [7]:
pd.Timestamp(year=2020, month=11, day=30, hour=20, minute=30, second=45)

Timestamp('2020-11-30 20:30:45')

In [8]:
# Create a Range of Time
print(pd.date_range("11:00", "13:30", freq="30min").time)

[datetime.time(11, 0) datetime.time(11, 30) datetime.time(12, 0)
 datetime.time(12, 30) datetime.time(13, 0) datetime.time(13, 30)]


In [9]:
# Change the frequency of time
print(pd.date_range("11:00", "13:30", freq="H").time)

[datetime.time(11, 0) datetime.time(12, 0) datetime.time(13, 0)]


In [10]:
# Converting to Timestamps using .to_datetime()
dt = pd.Series(['Jul 31, 2009', '2010-01-10', None])
print(pd.to_datetime(dt))

0   2009-07-31
1   2010-01-10
2          NaT
dtype: datetime64[ns]


NaT = Not a Time, similar to NaN

#### Create a Range of Dates
- Using the date.range() function by specifying the periods and the frequency, we can create the date series. 
- By default, the frequency of range is Days.

In [14]:
print(pd.date_range('1/1/2020', periods = 5))

DatetimeIndex(['2020-01-01', '2020-01-02', '2020-01-03', '2020-01-04',
               '2020-01-05'],
              dtype='datetime64[ns]', freq='D')


In [16]:
print(pd.date_range('1/1/2011', periods=5, freq='M'))      # changing the frequency

DatetimeIndex(['2011-01-31', '2011-02-28', '2011-03-31', '2011-04-30',
               '2011-05-31'],
              dtype='datetime64[ns]', freq='M')


#### Timedelta:

- Timedeltas are differences in times, expressed in difference units, for example, days, hours, minutes, seconds. 
- They can be both positive and negative.
- We can create Timedelta objects using various arguments as shown below −

In [19]:
# By passing a 'string' literal, we can create a timedelta object.

print(pd.Timedelta('2 days 2 hours 15 minutes 30 seconds'))

2 days 02:15:30


In [20]:
# By passing an integer value with the unit

print(pd.Timedelta(6,unit='h'))

0 days 06:00:00


#### Categorical Data:

- Often in real-time, data includes the text columns, which are repetitive. 
- Features like gender, country, and codes are always repetitive. These are the examples for categorical data.

The categorical data type is useful in the following cases −

- A string variable consisting of only a few different values. Converting such a string variable to a categorical variable will save some memory.

- The lexical order of a variable is not the same as the logical order (“one”, “two”, “three”). By converting to a categorical and specifying an order on the categories, sorting and min/max will use the logical order instead of the lexical order.

- As a signal to other python libraries that this column should be treated as a categorical variable (e.g. to use suitable statistical methods or plot types).

In [25]:
# Object Creation

print(pd.Series(['a','b','b','d','a'], dtype = 'category'))

0    a
1    b
2    b
3    d
4    a
dtype: category
Categories (3, object): [a, b, d]


In [26]:
# we can crete category using 'pd.Categorical'

print(pd.Categorical(['a','b','b','d','a']))

[a, b, b, d, a]
Categories (3, object): [a, b, d]


In [31]:
c = pd.Categorical(['a', 'b', 'c', 'a', 'b', 'c', 'd'], ordered=True, categories=['c', 'b', 'a',])
c

[a, b, c, a, b, c, NaN]
Categories (3, object): [c < b < a]

- Logically, the order means that, a is greater than b and b is greater than c.

In [34]:
# to get the property of categories
c.categories 

Index(['c', 'b', 'a'], dtype='object')

#### Comparison of Categorical Data:

Comparing categorical data with other objects is possible in three cases −
- comparing equality (== and !=) to a list-like object (list, Series, array, ...) of the same length as the categorical data
- all comparisons (==, !=, >, >=, <, and <=) of categorical data to another categorical Series, when ordered==True and the categories are the same.
- all comparisons of a categorical data to a scalar.

In [40]:
cat = pd.Series([1,2,3]).astype("category", categories=[1,2,3], ordered=True)
cat1 = pd.Series([2,2,2]).astype("category", categories=[1,2,3], ordered=True)

TypeError: astype() got an unexpected keyword argument 'categories'

#### Appending New Categories:
- Using the Categorical.add.categories() method, new categories can be appended

In [50]:
s1 = pd.Series(["a","b","c","a"], dtype="category")
s1 = s1.cat.add_categories([4])        # adding a new category
print(s1)

0    a
1    b
2    c
3    a
dtype: category
Categories (4, object): [a, b, c, 4]


In [51]:
print(s1.cat.categories)

Index(['a', 'b', 'c', 4], dtype='object')


#### Removing Categories
- Using the Categorical.remove_categories() method, unwanted categories can be removed.

In [47]:
s = pd.Series(["a","b","c","a"], dtype="category")
print(s)

0    a
1    b
2    c
3    a
dtype: category
Categories (3, object): [a, b, c]


In [48]:
# after removing
print(s.cat.remove_categories("a"))

0    NaN
1      b
2      c
3    NaN
dtype: category
Categories (2, object): [b, c]
