# Welcome to project Dionysus!

You were given the challenge of building code that can aggreagte and analyse wine production from many producers.  
Since you want to learn more about wine and automating processes, you accept.

You are given three datasets, [Almeirim.csv](Almeirim.csv), [Benavente.csv](Benavente.csv), and [Cartaxo.csv](Cartaxo.csv)
representing monthly wine bottling output from three small to medium producers.

The data represents historical **monthly** output in thousands of litters, but rough estimates for output.

**All datasets comprise the same time interval!**

These three datasets represent a subset of the **thousands** of datasets you will have to analyse in the future project. **HOWEVER**, these 3 datasets represent the 3 unique configurations for all future dataset analysis.

If your analysis works for these 3, it will work for all future datasets.

To better prepare for future analysis you decide to develop some code to speed things up.

## The Analysis

### Exploration
1. What is the combination of directives and characters that allows you to read the Almeirim.csv item that has temporal information?
1. And for Benavente.csv?
2. And for Cartaxo.csv?

In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import os 
from statsmodels.tsa.statespace.sarimax import SARIMAX
import glob

In [7]:
datafileName1 = os.path.join('..','data','Almeirim.csv')
df_a = pd.read_csv(datafileName1)
df_a.head()

Unnamed: 0,date,output
0,85-01,72.5052
1,85-02,70.672
2,85-03,62.4502
3,85-04,57.4714
4,85-05,55.3151


In [8]:
datafileName2 = os.path.join('..','data','Benavente.csv')
df_b = pd.read_csv(datafileName2)
df_b.head()

Unnamed: 0,date,output
0,"Jan,1985",99.178372
1,"Feb,1985",96.736512
2,"Mar,1985",85.540669
3,"Apr,1985",78.774726
4,"May,1985",75.870978


In [9]:
datafileName3 = os.path.join('..','data','Cartaxo.csv')
df_c = pd.read_csv(datafileName3)
df_c.head()

Unnamed: 0,date,output
0,85-01,60.421
1,85-02,59.042242
2,85-03,52.305336
3,85-04,48.257036
4,85-05,46.563892


In [11]:
#transforming the data in datetime
almeirim_df = pd.read_csv(datafileName1, parse_dates=['date'], date_parser=lambda x: pd.to_datetime(x, format='%y-%m'))
cartaxo_df = pd.read_csv(datafileName3, parse_dates=['date'], date_parser=lambda x: pd.to_datetime(x, format='%y-%m'))

In [12]:
almeirim_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 396 entries, 0 to 395
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype         
---  ------  --------------  -----         
 0   date    396 non-null    datetime64[ns]
 1   output  396 non-null    float64       
dtypes: datetime64[ns](1), float64(1)
memory usage: 6.3 KB


In [13]:
cartaxo_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 396 entries, 0 to 395
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype         
---  ------  --------------  -----         
 0   date    396 non-null    datetime64[ns]
 1   output  396 non-null    float64       
dtypes: datetime64[ns](1), float64(1)
memory usage: 6.3 KB


In [15]:
benavente_df = pd.read_csv(datafileName2, parse_dates=['date'], date_parser=lambda x: pd.to_datetime(x, format='%b,%Y'))

In [16]:
class Wine:
    """
    A class to facilitate analysis of wine production data.

    Attributes
    ----------
    data : pd.DataFrame
        A dataframe containing the wine production data.
    producer : str
        The name of the wine producer.
    
    Modules
    -------
    __init__(self, data, producer)
        Constructor for the Wine class.
    """
    def __init__(self, data, producer):
        """
        Initializes the Wine object with data and a producer name.
        """
        self.data = data
        self.producer = producer