# Accessing Time Series Datasets
The raw time series datasets used in this project are stored with Azure blob storage. This notebook shows an example of how to access the datasets and an overview of the data.

## Datasets

### M4
* Stored as train and test csv files ogether with a csv file of metadata. The data is downloaded from the M4 repository on github <https://github.com/Mcompetitions/M4-methods>.
* Each line in the files are a distinct time series and the index of a specific time series is the same in the train and test files.
* Time series are grouped in files by frequency.
* Frequencies are:
    * Yearly
    * Quarterly
    * Monthly
    * Weekly
    * Daily
    * Hourly
### FRED
* The time series are stored in JSON files with 2000 time series in each file.
* Metadata for the time series are stored in files meta_xxxx.json.
* Time series observations are stored in files raw_xxxx.json.
* The data is collected from the Federal Reserve Economic Data database <https://fred.stlouisfed.org/>.
* It is collected using the `fred` pythion api.
* Frequencies are:
    * Yearly
    * Quarterly
    * Monthly
    * Weekly
    * Daily

## Access
Access to the datasets does not at this point require access keys. The blob storage containers are set to public read access. To access the data you can use the python api.

In [35]:
import os
from azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient, __version__

try:
    print("Azure Blob storage v" + __version__)
    # Quick start code goes here
except Exception as ex:
    print('Exception:')
    print(ex)

Azure Blob storage v12.5.0


### FRED metadata

In [49]:
import json

In [43]:
container_client = ContainerClient(account_url="https://tsdatasets.blob.core.windows.net/", container_name="fred")
all_blobs = container_client.list_blobs()
for b in all_blobs:
    bname = b["name"]
    print(f"Name of blob: {bname}")
    blob_client = BlobClient(account_url="https://tsdatasets.blob.core.windows.net/", container_name="fred", blob_name=bname)
    with open(bname, "wb") as my_blob:
        download_stream = blob_client.download_blob()
        my_blob.write(download_stream.readall())
        my_blob.close()
    break
with open(bname, "rb") as fp:
    blob_json = json.load(fp)
    fp.close()
print(f"Number of samples in file: {len(blob_json)}\n")
print("Example of metadata for a time series:\n")
for key, value in blob_json[1].items():
    print(f"{key}: {value}")

Name of blob: meta_000000.json
Number of samples in file: 2001

Example of metadata for a time series: 
category_name: New England Textile Industry, 1815-1860
frequency: Annual
frequency_short: A
group_popularity: 2
id: CPNEAMOSKEAGI
last_updated: 2019-11-01 13:43:55-05
node_id: 33934
notes: Data were gathered by the authors from various textile collections deposited at various museums and libraries on the East Coast, and collected from the original business records of textile mills in New England wherever possible. Preference was given to records that were complete and continuous for long periods, and reasonably intelligible.

Reporting techniques differed greatly from mill to mill. To make the data comparable, each mill&#39;s output was allocated uniformly over the months covered by the accounting period. The monthly figures were then summed to calendar years.

More details about the data are available in the book chapter &quot;The New England Textile Industry, 1825-60: Trends and Fl

### FRED observations

In [44]:
bname = "raw_000000.json"
print(f"Name of blob: {bname}")
blob_client = BlobClient(account_url="https://tsdatasets.blob.core.windows.net/", container_name="fred", blob_name=bname)
with open(bname, "wb") as my_blob:
    download_stream = blob_client.download_blob()
    my_blob.write(download_stream.readall())
    my_blob.close()
with open(bname, "rb") as fp:
    blob_json = json.load(fp)
    fp.close()
print(f"Number of samples in file: {len(blob_json)}")
for key, value in blob_json[1].items():
    print(f"{key}: {value}")

Name of blob: raw_000000.json
Number of samples in file: 2001
category_name: New England Textile Industry, 1815-1860
frequency: Annual
id: CPNEAMOSKEAGI
node_id: 33934
observations: [{&#39;date&#39;: &#39;1837-01-01&#39;, &#39;value&#39;: &#39;1272.0&#39;}, {&#39;date&#39;: &#39;1838-01-01&#39;, &#39;value&#39;: &#39;1407.0&#39;}, {&#39;date&#39;: &#39;1839-01-01&#39;, &#39;value&#39;: &#39;1453.0&#39;}, {&#39;date&#39;: &#39;1840-01-01&#39;, &#39;value&#39;: &#39;1126.0&#39;}, {&#39;date&#39;: &#39;1841-01-01&#39;, &#39;value&#39;: &#39;1077.0&#39;}, {&#39;date&#39;: &#39;1842-01-01&#39;, &#39;value&#39;: &#39;1102.0&#39;}, {&#39;date&#39;: &#39;1843-01-01&#39;, &#39;value&#39;: &#39;1148.0&#39;}, {&#39;date&#39;: &#39;1844-01-01&#39;, &#39;value&#39;: &#39;1160.0&#39;}, {&#39;date&#39;: &#39;1845-01-01&#39;, &#39;value&#39;: &#39;1189.0&#39;}, {&#39;date&#39;: &#39;1846-01-01&#39;, &#39;value&#39;: &#39;1190.0&#39;}, {&#39;date&#39;: &#39;1847-01-01&#39;, &#39;value&#39;: &#39;540.0&

### M4

In [53]:
import pandas as pd
container_client = ContainerClient(account_url="https://tsdatasets.blob.core.windows.net/", container_name="mfour")
all_blobs = container_client.list_blobs()
for b in all_blobs:
    bname = b["name"]
    print(f"Name of blob: {bname}")
print(f"\nDownloading {bname}")
blob_client = BlobClient(account_url="https://tsdatasets.blob.core.windows.net/", container_name="mfour", blob_name=bname)

with open(bname, "wb") as my_blob:
    download_stream = blob_client.download_blob()
    my_blob.write(download_stream.readall())
    my_blob.close()

df = pd.read_csv(bname)
print(df.head())

Name of blob: Daily-test.csv
Name of blob: Daily-train.csv
Name of blob: Hourly-test.csv
Name of blob: Hourly-train.csv
Name of blob: Monthly-test.csv
Name of blob: Monthly-train.csv
Name of blob: Quarterly-test.csv
Name of blob: Quarterly-train.csv
Name of blob: Weekly-test.csv
Name of blob: Weekly-train.csv
Name of blob: Yearly-test.csv
Name of blob: Yearly-train.csv

Downloading Yearly-train.csv
   V1      V2      V3      V4      V5      V6      V7      V8      V9     V10  \
0  Y1  5172.1  5133.5  5186.9  5084.6  5182.0  5414.3  5576.2  5752.9  5955.2   
1  Y2  2070.0  2104.0  2394.0  1651.0  1492.0  1348.0  1198.0  1192.0  1105.0   
2  Y3  2760.0  2980.0  3200.0  3450.0  3670.0  3850.0  4000.0  4160.0  4290.0   
3  Y4  3380.0  3670.0  3960.0  4190.0  4440.0  4700.0  4890.0  5060.0  5200.0   
4  Y5  1980.0  2030.0  2220.0  2530.0  2610.0  2720.0  2970.0  2980.0  3100.0   

   ...  V827  V828  V829  V830  V831  V832  V833  V834  V835  V836  
0  ...   NaN   NaN   NaN   NaN   NaN   NaN