In [1]:
import pandas as pd

Generating a time series in pandas involves creating a DataFrame or Series where the index is a datetime object. This allows you to work with time-related data efficiently. Here's how you can generate a time series in pandas:

**Using `pd.date_range()`**:
- The `pd.date_range()` function creates a range of datetime values based on specified parameters such as start date, end date, frequency, and periods.
- For example, `dates = pd.date_range(start='2022-01-01', end='2022-01-31', freq='D')` generates daily datetime values from January 1, 2022, to January 31, 2022.

In [7]:
dates = pd.date_range(start="2022-01-01", end="2022-01-31", freq="D")

dates

DatetimeIndex(['2022-01-01', '2022-01-02', '2022-01-03', '2022-01-04',
               '2022-01-05', '2022-01-06', '2022-01-07', '2022-01-08',
               '2022-01-09', '2022-01-10', '2022-01-11', '2022-01-12',
               '2022-01-13', '2022-01-14', '2022-01-15', '2022-01-16',
               '2022-01-17', '2022-01-18', '2022-01-19', '2022-01-20',
               '2022-01-21', '2022-01-22', '2022-01-23', '2022-01-24',
               '2022-01-25', '2022-01-26', '2022-01-27', '2022-01-28',
               '2022-01-29', '2022-01-30', '2022-01-31'],
              dtype='datetime64[ns]', freq='D')

**Creating a DataFrame with Time Index**:
 - After generating datetime values, you can use them as the index for a DataFrame. For instance:

In [8]:
dates = pd.date_range(start="2022-01-01", end="2022-01-31", freq="D")
data = {"value": range(0, 155, 5)}
df = pd.DataFrame(data, index=dates)

df.tail()

Unnamed: 0,value
2022-01-27,130
2022-01-28,135
2022-01-29,140
2022-01-30,145
2022-01-31,150


**Resampling and Frequency Conversion**:
- You can resample the time series to change its frequency, such as aggregating daily data into monthly data using `resample()` and specifying the desired frequency.
    ```python
    monthly_data = df.resample('M').mean()
    ```
- This will calculate the mean of 'value' for each month, resulting in a new DataFrame with monthly data.
    ```

In [13]:
df.resample("W").sum()

Unnamed: 0,value
2022-01-02,5
2022-01-09,175
2022-01-16,420
2022-01-23,665
2022-01-30,910
2022-02-06,150


What does this mean practically? Well, if you have a timeseries that might have _missing_ data, you can reconcile the issue by generating a date series, like we did in SQL.

In [21]:
import pandas as pd

alerts_df = pd.read_parquet("../../data/nps/nps_public_data_alerts.parquet")

alerts_df["alert_date"] = pd.to_datetime(alerts_df["lastIndexedDate"]).dt.date
alerts_df

Unnamed: 0,relatedRoadEvents,category,description,title,parkCode,lastIndexedDate,url,id,alert_date
0,[],Danger,Visitors may encounter impacts including trail...,Rocky Branch Fire,shen,2024-03-21 16:40:19+00:00,https://inciweb.nwcg.gov/incident-information/...,B62964C0-2FE6-4BC9-A116-08BEE0483784,2024-03-21
1,[],Danger,The following areas & trails are closed: Guada...,Backcountry and Trail Closures Due to Flood Da...,cave,2024-03-15 12:49:21+00:00,https://www.nps.gov/cave/learn/news/additional...,01058D5C-BCA2-4055-BECD-02A307B5C296,2024-03-15
2,[],Danger,The Delaware River is measuring above 8 feet a...,Winter High Water Warning - Lifejackets Mandatory,dewa,2024-03-06 08:35:37+00:00,https://www.nps.gov/dewa/planyourvisit/river-c...,B1180914-7F03-4190-804D-503D4A0B663E,2024-03-06
3,[],Danger,Recent flooding has caused major damage to the...,Walnut Canyon Desert Drive Closed Due to Flood...,cave,2023-08-02 14:13:44+00:00,https://www.nps.gov/cave/learn/news/flood-dama...,257506D7-38CB-4244-9DB4-F19B29CDD2FC,2023-08-02
4,[],Danger,Be careful and safe while out in the canyon! B...,SEASONAL SAFETY MESSAGE,liri,2023-03-23 14:16:55+00:00,https://www.nps.gov/liri/planyourvisit/psar.htm,2769E44A-3C7E-4703-8702-44862D407814,2023-03-23
...,...,...,...,...,...,...,...,...,...
688,[],Information,"The use of unmanned aircraft, also known as dr...",Unmanned Aircraft (Drones) Are Prohibited,cave,2018-07-11 13:54:39+00:00,https://www.nps.gov/articles/unmanned-aircraft...,4793E7A9-98BB-43C0-9A1C-76DE85370F1C,2018-07-11
689,[],Park Closure,"Beginning Monday, August 14, 2023, constructio...",Rehabilitation of Colonial Parkway beginning A...,jame,2024-03-08 11:32:08+00:00,https://www.nps.gov/colo/learn/news/national-p...,7E4E47B7-31DC-4F73-912E-499E22F8831B,2024-03-08
690,[],Park Closure,"Beginning Monday, August 14, 2023, constructio...",Rehabilitation of Colonial Parkway beginning A...,jame,2024-03-08 11:31:19+00:00,https://www.nps.gov/colo/learn/news/national-p...,2B43974C-A793-4160-9DF1-D92EB818F520,2024-03-08
691,[],Park Closure,"Beginning Monday, August 14, 2023, constructio...",Rehabilitation of Colonial Parkway beginning A...,york,2024-03-08 11:27:54+00:00,https://www.nps.gov/colo/learn/news/national-p...,E82CDC19-42F7-4FCC-8BD5-E6110288BF18,2024-03-08


In [None]:
# this df has date gaps
alerts_by_category = (
    alerts_df.groupby(["alert_date", "category"])["description"]
    .count()
    .reset_index()
    .sort_values("alert_date")
)

# this one doesn't— notice how we stack / unstack the index

alerts_df["alert_date"] = pd.to_datetime(alerts_df["lastIndexedDate"])

# Use grouper to build a datetime index with no gaps

alerts_no_gaps = (
    alerts_df.set_index("alert_date")
    .groupby([pd.Grouper(freq="1D"), "category"])["description"]
    .count()
)

# Unstack the category index to columns, fill in missing dates, and fill in NaNs with 0

num_alerts_unstacked = (
    alerts_no_gaps.unstack()
    .resample("1D")
    .asfreq()[["Caution", "Danger", "Information", "Park Closure"]]
    .fillna(0)
)

# Stack the category index back into a column

num_alerts = (
    num_alerts_unstacked.stack().reset_index().rename(columns={0: "num_alerts"})
)

In [34]:
alerts_df["category"].unique()

array(['Danger', 'Caution', 'Information', 'Park Closure', ''],
      dtype=object)

In [35]:
alerts_no_gaps

alert_date                 category    
2012-04-02 00:00:00+00:00  Information     1
2012-10-09 00:00:00+00:00                  1
2014-03-29 00:00:00+00:00  Caution         1
2014-11-30 00:00:00+00:00  Information     1
2015-05-21 00:00:00+00:00  Caution         1
                                          ..
2024-03-22 00:00:00+00:00  Information     3
                           Park Closure    5
2024-03-23 00:00:00+00:00  Caution         2
                           Information     2
                           Park Closure    7
Name: description, Length: 472, dtype: int64

In [36]:
num_alerts_unstacked

category,Caution,Danger,Information,Park Closure
alert_date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2012-04-02 00:00:00+00:00,0.0,0.0,1.0,0.0
2012-04-03 00:00:00+00:00,0.0,0.0,0.0,0.0
2012-04-04 00:00:00+00:00,0.0,0.0,0.0,0.0
2012-04-05 00:00:00+00:00,0.0,0.0,0.0,0.0
2012-04-06 00:00:00+00:00,0.0,0.0,0.0,0.0
...,...,...,...,...
2024-03-19 00:00:00+00:00,2.0,0.0,3.0,0.0
2024-03-20 00:00:00+00:00,2.0,1.0,1.0,3.0
2024-03-21 00:00:00+00:00,1.0,1.0,4.0,7.0
2024-03-22 00:00:00+00:00,2.0,0.0,3.0,5.0


In [38]:
num_alerts["category"].unique()

array(['Caution', 'Danger', 'Information', 'Park Closure'], dtype=object)

We'll dig into date ranges more in our next lesson on Pandas windows. Of course, Pandas series also support arbitrary Python ranges.

In [39]:
pd.DataFrame(
    {
        "one_to_hundred": pd.Series(range(1, 101)),
        "hundred_to_one": pd.Series(range(100, 0, -1)),
        "one_hundred_by_twos": pd.Series(range(2, 202, 2)),
        "one_hundred_squares": pd.Series(range(1, 101)) ** 2,
    }
)

Unnamed: 0,one_to_hundred,hundred_to_one,one_hundred_by_twos,one_hundred_squares
0,1,100,2,1
1,2,99,4,4
2,3,98,6,9
3,4,97,8,16
4,5,96,10,25
...,...,...,...,...
95,96,5,192,9216
96,97,4,194,9409
97,98,3,196,9604
98,99,2,198,9801
