In [2]:
import pandas as pd
pd.__version__ , pd.__path__

('2.0.3', ['C:\\Users\\FashN\\anaconda3\\Lib\\site-packages\\pandas'])

## Overview of Time Series Data
#### What is Time Series Data?
- Time series data, also referred to as time-stamped data, is a sequence of data recorded at specific intervals of time (can be monthly, daily, hourly, ....).
- These data points are analyzed to forecast the future.
- It is time dependent.
- Time series data is effected by four components:
    - **Trend:** Increase or decrease in the series over a period of time. It persist over a long period of time. For Example, population growth of a country over years
    - **Seasonality:** Regular patterns of up and down fluctuations, e.g., Sale of icecream increases in every summer
    - **Cyclicity:** Variations that are caused at irregular intervals. Forexample, 5 years of economic growth, followed by 3 years of recession, followed by 7 years of economic growth, followed. by 1 year of recession
    - **Irregularity:** It refers to variations which occur due to unpredictable factors and also do not repeat in particular patterns. For examples, fluctuations caused by earthquakes, floods, wars, etc
   
#### What is time series Analysis?
- Time series analysis is the use of statistical methods to analyze time series data and extract meaningful statistics and characteristics about the data. Time series analysis helps identify trends, cycles, and seasonal variances to aid in the forecasting of a future event.
- Time series analysis can be useful to see how a given variable changes over time (while time itself, in time series data, is often the independent variable). Time series analysis can also be used to examine how the changes associated with the chosen data point compare to shifts in other variables over the same time period.

In [4]:
import time

In [5]:
# Showing `+5:00` hours time delta because of local time zone (PKT) differs from UTC with 5 hours
dtg1 = time.ctime(0)
dtg1

'Thu Jan  1 01:00:00 1970'

In [6]:
#If you pass the current elapsed seconds since UNIX epoch to the `ctime()` method, it returns current datetime
seconds = time.time()
dtg2 = time.ctime(seconds)
dtg2

'Fri Dec 22 18:03:56 2023'

## b. Python Datetime Module
The `datetime` module can support many of the same operations as `time` module, but provides a more object oriented set of types, and also has some limited support for time zones as well.

In [7]:
# use dir() to get the list of complete functions in datetime module
import datetime
print(dir(datetime))

['MAXYEAR', 'MINYEAR', 'UTC', '__all__', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__spec__', 'date', 'datetime', 'datetime_CAPI', 'sys', 'time', 'timedelta', 'timezone', 'tzinfo']


In [9]:
dtg = datetime.datetime(2023,12,22)
print(dtg)
print(type(dtg))

2023-12-22 00:00:00
<class 'datetime.datetime'>


In [10]:
print(datetime.datetime(2021, 12, 31, 4, 30, 54, 678))

2021-12-31 04:30:54.000678


**(ii)  The `time([hour[, minute[, second[, microsecond[, tzinfo]]]]]) ` methods returns a time object. All arguments are optional**

In [13]:
time = datetime.time(10, 15, 23)
print(time)
print(type(time))

10:15:23
<class 'datetime.time'>


**(iii) You can explore some commonly used attributes related with the `<class 'datetime.time'>`.**
- `dtg.year:` returns the year
- `dtg.month:` returns the month
- `dtg.day:` returns the date
- `dtg.hour:` returns the hour
- `dtg.minute:` returns the minutes
- `dtg.second:` returns the seconds

In [30]:
dtg = datetime.datetime(2023, 12, 31, 4, 25, 58)
print(dtg)
print(type(dtg))

2023-12-31 04:25:58
<class 'datetime.datetime'>


In [22]:
dtg.year

2024

In [23]:
dtg.month

12

In [28]:
dtg.day

31

In [31]:
dtg.hour

4

In [32]:
dtg.minute

25

In [33]:
dtg.second

58

- Since noon happens at different times in different parts of the world, therefore, the world is divided in different time zones.
- On Mac, Linux, and Windows operating systems, the information about these time zones is kept in files.
- Let me show you the contents of these files on my Mac system

In [36]:
# The UNIX Epoch in system local time is five hours ahead of mid night 1st Jan 1970
# (Coordinated Universal Time a successor to Greenwich Mean Time)
import time, datetime

dgt = time.ctime(0)
dgt

'Thu Jan  1 01:00:00 1970'

## 2. Overview of Pandas Time Series Data Structures
- **Timestamp & DatetimeIndex:**
    - A `Timestamp` refer to particular moment in time, e.g., 28 July, 1969 at 11:00 am
    - It is a replacement of Python's built-in datetime object
    - The `pd.to_datetime()` method is used to create a `Timestamp` object
    - The `pd.date_range()` method is used to generate a `DatetimeIndex` object
- **Period & PeriodIndex:**
    - A `Period` refer to length of time between a start and end point, with each interval of uniform length
    - The `pd.to_period()` method is used to create a `Period` object
    - The `pd.period_range()` method is used to create a `PeriodIndex`
- **Timedelta & TimedeltaIndex:**
    - A `Timedelta` or duration refer to an exact length of time, e.g., a duration of 235.54 seconds
    - A `Timedelta` is created when you subtract two dates, while a `TimedeltaIndex` is created when you subtract two Periods

## 3. Converting Strings to Pandas Timestamp Object
- Pandas `pd.to_datetime()` method is used to convert its only required argument `arg` to a Timestamp object.

```
pd.to_datetime(arg, format=None, errors='raise', unit=None, origin='unix')
```
- Where,
    - `arg` can be a string, Series, int, datetime, list, tuple, 1-d array, DataFrame/dict-like object to convert
    - `errors` {‘ignore’, ‘raise’, ‘coerce’}, default ‘raise’
        - If `raise`, then invalid parsing will raise an exception.
        - If `coerce`, then invalid parsing will be set as NaT.
        - If `ignore`, then invalid parsing will return the input
    - `format`: Used if the `arg` is not in the format as expected by the method
    - `unit`: Used if the `arg` is integer and can be (D,s,ms,us,ns) passed since `origin` (default is `ns`)
    - `origin`: is the reference point from where you want to start counting your units from. The default value of `origin` is the UNIX epoch.

In [41]:
import pandas as pd
time_str = "2023-12-22 08:33:12"
print(time_str)
print(type(time_str))

2023-12-22 08:33:12
<class 'str'>


In [43]:
time_stamp = pd.to_datetime(time_str)
print(time_stamp)
print(type(time_stamp))

2023-12-22 08:33:12
<class 'pandas._libs.tslibs.timestamps.Timestamp'>


**`pd.Timestamp Attributes`**

`Series.dt.[ts.]second`: Returns seconds

`ts.minute`: Returns year

`ts.hour`: Returns hour

`ts.day`: Returns day

`ts.month`: Returns month as January=1, December=12

`ts.year`: Returns the year of datetime object

`Series.dt.day_name()`: Returns name of the day as string

`Series.dt.month_name()`: Returns month as string

For details Read: https://pandas.pydata.org/docs/reference/api/pandas.Series.dt.year.html

>You can pass a list of strings containing dates to `pd.to_datetime()`, which will return a `DatetimeIndex` object

In [44]:
time_stamp.year

2023

In [47]:
time_stamp.month


12

In [48]:
time_stamp.quarter

4

In [49]:
time_stamp.minute


33

In [50]:
# If there is a invalid string that cannot be converted to a valid date, you will get an error
#pd.to_datetime(['2017-01-05', 'Jan 6, 2017', 'abc'])

In [51]:
# Use `errors=coerce` to translate the remaining data and keep `NaT` for invalid string
pd.to_datetime(['2017-01-05', 'Jan 6, 2017', 'abc'], errors='coerce')

DatetimeIndex(['2017-01-05', 'NaT', 'NaT'], dtype='datetime64[ns]', freq=None)

In [52]:
pd.to_datetime(['2023-12-22', 'Dec 22, 2023', 'date'], errors = 'coerce')

DatetimeIndex(['2023-12-22', 'NaT', 'NaT'], dtype='datetime64[ns]', freq=None)

### b. Convert Pandas Series of Strings to Series of Timestamps

In [74]:
# A pandas series having same date but in different formats
dates = pd.Series(['2022-03-06 08:30', '2022/03/06 08:30', '6 March, 2022 08:30', 'Mar 06, 2022 08:30', '202203060830'])
print(type(dates))
dates

<class 'pandas.core.series.Series'>


0       2022-03-06 08:30
1       2022/03/06 08:30
2    6 March, 2022 08:30
3     Mar 06, 2022 08:30
4           202203060830
dtype: object

In [77]:
# date = ['2022-03-06 08:30', '2022/03/06 08:30', '6 March, 2022 08:30', 'Mar 06, 2022 08:30', '202203060830']

In [82]:
# to_datetime() function will convert all these different formats into a common format
convert = pd.to_datetime(dates)
convert

ValueError: time data "2022/03/06 08:30" doesn't match format "%Y-%m-%d %H:%M", at position 1. You might want to try:
    - passing `format` if your strings have a consistent format;
    - passing `format='ISO8601'` if your strings are all ISO8601 but not necessarily in exactly the same format;
    - passing `format='mixed'`, and the format will be inferred for each element individually. You might want to use `dayfirst` alongside this.

### c. Handling Issues of DateTime Formats
From above examples, it appears that `pd.to_datetime()` works fine for all date formats. Let us try storing  6 March, 2022 as '06/03/2022' or '06-03-2022'

**(i) Problem 1:**

In [84]:
dt = pd.to_datetime('06-03-2022')
dt

Timestamp('2022-06-03 00:00:00')

In [85]:
dt.day, dt.month

(3, 6)

**Oops!**, Pandas `to_datetime()` method has converted the string to datetime, but interpreted it as 3 June 2022
>The `pd.to_datetime()` by default, will parse string with month first (MM/DD, MM DD, or MM-DD) format

In [86]:
dt = pd.to_datetime('23-03-2023')

  dt = pd.to_datetime('23-03-2023')


In [87]:
dt.day, dt.month

(23, 3)

**(ii) Problem 2:**

In [89]:
# dt = pd.to_datetime('2022-03-06 08-PM')

**Oops again**!, Pandas `to_datetime()` method has raised an error saying `ParserError: Unknown string format: 2022-03-06 08-PM`
>It seems that `pd.to_datetime()` expects the time to be in 24 hours clock and not if the time is mentioned using AM (Ante-Meridiem meaning before Midday) or PM (Post-Meridiem, meaning after midday)

**(iii) Solution of above two Problems:**
>Pass an appropriate `format string` to the `format` argument of the `pd.to_datetime()` method. The format string need to be prepared as per the string date format.
Visit this link to see for Format codes: https://pandas.pydata.org/docs/reference/api/pandas.Period.strftime.html

In [96]:
# Passing appropriate format string will resolve above two problems
dt = pd.to_datetime("06-03-2023 08-PM", format = "%d-%m-%Y %I-%p")
dt

Timestamp('2023-03-06 20:00:00')

In [97]:
dt.day, dt.month

(6, 3)

### d. Convert a Single Integer to Pandas Timestamp
- Pandas `pd.to_datetime()` method can also be used to convert the first argument passed as integer to Pandas `Timestamp` object. 
- The `unit` argument tells about the unit of the `arg`, and it can be seconds, days or years
- The `origin` argument can be any reference point from where you want to start counting your units from. The default value of `origin` is the UNIX epoch.
```
pd.to_datetime(arg, format=None, unit=None, origin='unix')
```

In [99]:
dt = pd.to_datetime(10, unit='D', origin = '2023-01-04')
dt

Timestamp('2023-01-14 00:00:00')

In [100]:
dt = pd.to_datetime(1645594235, unit='ms', origin='unix')
dt

Timestamp('1970-01-20 01:06:34.235000')

In [102]:
dt = pd.to_datetime(19000000, unit = 'ms', origin = 'unix')
dt

Timestamp('1970-01-01 05:16:40')