 About the Repository

Brazilian Government is working to open its Natural Gas Market to third-parties and tchangingsome laws regarding the fuel. 

This repository aims to collect data about Natural Gas in Brazil and build, in the future, a dashboard with all information gathered. 

# Business Understanding

## Context

Nowadays the Brazilian Natural production and transportation is controlled by Petrobras, a state-owed multinational corporation in the petroleum industry, ranked as the 120th largest company in the world by revenue². The company has 6 business areas¹ (in order of revenue):

> * `Refining, transportation and marketing`
> * `Exploration and production`
> * `Distribution` 
> * `Gas and power` 
> * `International`
> * `Biofuels`

### Gas and Power

The main core of it is deal with the transportation and trading of natural gas and LNG, and generation and trading of electric power, and the fertilizer business. 

Its important to mention that Petrobras also controls the distribution of oil products, ethanol, biodiesel and `natural gas` to wholesalers and through the Petrobras Distribuidora S.A. retail network in Brazil 

### Termination of Commitment 

The Administrative Council for Economic Defense (Portuguese: Conselho Administrativo de Defesa Econômica - CADE) and Petrobras signed a Term of Commitment to Terminate³ (TCC) signed due to instigations regarding alleged anti-competitive conduct by Petrobras in the Brazilian Natural Gas Market, including abuse of a dominant position and discrimination against competitors through differentiated pricing.

Through the agreement, the state company is committed to sell assets related to the natural gas market. The measure aims to prevent the future occurrence of the same facts investigated by Cade, in addition to stimulating competition in the sector, so far exploited almost entirely by Petrobras, through the entry of new agents that would attract national and international investments at various levels of the chain productive.




Source: 

1. [Wikipedia](https://en.wikipedia.org/wiki/Petrobras)

2. [Fortune - Global 500](https://fortune.com/company/petrobras/global500/)

3. [CADE](http://www.cade.gov.br/noticias/cade-e-petrobras-celebram-acordo-para-venda-de-ativos-no-mercado-de-gas-natural)

In [3]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import math
import datetime

from googletrans import Translator

## Wrangling Data

### Looking at the data from ANP

The first step is to understand the data and know how it is stored. For that we will open the csv file and see it in its original form.

<img src="data_set/df1-page-001.jpg"  width="800" align="center">


#### The subject

It is a table, in Portuguese, about the production of Brazilian natural gas in million cubic meters during the years 2010-2019. This production is grouped by federation unit and followed by location. 

At federation unit (first) column, in addition to the gas producing states, on the top of it there is also the total and the subtotal of national production. 

The location can be categorized as onshore and as offshore, however the national production can also be categorized according to its geological layer, pre-salt and post-salt. 

The last column is the gas production ratio between 2019 and 2018.

At the bottom there are the table's source and an observation saying that the total value of production includes the volumes of reinjection, burning, reduction and own consumption.

#### Creating a DataFrame

Pandas will be used to read a csv file and create the dataframe.

In [4]:
df = pd.read_excel(r'data_set/anuario-2020-tabela-2_30.xls',  header = [0,2,3], index_col = [0,1])

Now lets take a look into how `df` will be displayed.

In [5]:
df.head()

Unnamed: 0_level_0,"Tabela 2.30 – Evolução da capacidade de processamento de gás natural, segundo polos produtores – 2010-2019","Tabela 2.30 – Evolução da capacidade de processamento de gás natural, segundo polos produtores – 2010-2019","Tabela 2.30 – Evolução da capacidade de processamento de gás natural, segundo polos produtores – 2010-2019","Tabela 2.30 – Evolução da capacidade de processamento de gás natural, segundo polos produtores – 2010-2019","Tabela 2.30 – Evolução da capacidade de processamento de gás natural, segundo polos produtores – 2010-2019","Tabela 2.30 – Evolução da capacidade de processamento de gás natural, segundo polos produtores – 2010-2019","Tabela 2.30 – Evolução da capacidade de processamento de gás natural, segundo polos produtores – 2010-2019","Tabela 2.30 – Evolução da capacidade de processamento de gás natural, segundo polos produtores – 2010-2019","Tabela 2.30 – Evolução da capacidade de processamento de gás natural, segundo polos produtores – 2010-2019","Tabela 2.30 – Evolução da capacidade de processamento de gás natural, segundo polos produtores – 2010-2019"
Unnamed: 0_level_1,Capacidade de processamento (mil m3/dia)1,Capacidade de processamento (mil m3/dia)1,Capacidade de processamento (mil m3/dia)1,Capacidade de processamento (mil m3/dia)1,Capacidade de processamento (mil m3/dia)1,Capacidade de processamento (mil m3/dia)1,Capacidade de processamento (mil m3/dia)1,Capacidade de processamento (mil m3/dia)1,Capacidade de processamento (mil m3/dia)1,Capacidade de processamento (mil m3/dia)1
Unnamed: 0_level_2,2010,2011,2012,2013,2014,2015,2016,2017,2018,2019
Total,76396,90396.0,90396.0,90396.0,96390.0,95350.0,95650.0,95650.0,95650.0,107210.0
Total,76396,,,,,,,,,
Urucu,9706,9706.0,9706.0,9706.0,12200.0,12200.0,12200.0,12200.0,12200.0,12200.0
Lubnor,350,350.0,350.0,350.0,350.0,350.0,350.0,350.0,350.0,350.0
Guamaré,5700,5700.0,5700.0,5700.0,5700.0,5700.0,5700.0,5700.0,5700.0,5700.0


It doesn't look good. Since the csv file is a multi-indexed table, it will be necessary to set the `head` and `index_col` parameters of the `pd.read_excel()` function, in order to make the correct indexation of the dataframe according to the original table. 

Looking again at the table, is easy to tell that rows `1, 3` and `4` goes for the `header` (columns labels) and the columns `A` and `B` goes for the `index_col` (index labels). It is important to notice that the parameters must be sended as index (e.g. column `A` refers to index `0`, and `B` to `1`.)

#### Reading Dataframe as a MultiIndexed Table

In [6]:
df = pd.read_excel(r'data_set/anuario-2020-tabela-2_30.xls',  header = [0,2,3], index_col = [0,1])


In [7]:
df.head()

Unnamed: 0_level_0,"Tabela 2.30 – Evolução da capacidade de processamento de gás natural, segundo polos produtores – 2010-2019","Tabela 2.30 – Evolução da capacidade de processamento de gás natural, segundo polos produtores – 2010-2019","Tabela 2.30 – Evolução da capacidade de processamento de gás natural, segundo polos produtores – 2010-2019","Tabela 2.30 – Evolução da capacidade de processamento de gás natural, segundo polos produtores – 2010-2019","Tabela 2.30 – Evolução da capacidade de processamento de gás natural, segundo polos produtores – 2010-2019","Tabela 2.30 – Evolução da capacidade de processamento de gás natural, segundo polos produtores – 2010-2019","Tabela 2.30 – Evolução da capacidade de processamento de gás natural, segundo polos produtores – 2010-2019","Tabela 2.30 – Evolução da capacidade de processamento de gás natural, segundo polos produtores – 2010-2019","Tabela 2.30 – Evolução da capacidade de processamento de gás natural, segundo polos produtores – 2010-2019","Tabela 2.30 – Evolução da capacidade de processamento de gás natural, segundo polos produtores – 2010-2019"
Unnamed: 0_level_1,Capacidade de processamento (mil m3/dia)1,Capacidade de processamento (mil m3/dia)1,Capacidade de processamento (mil m3/dia)1,Capacidade de processamento (mil m3/dia)1,Capacidade de processamento (mil m3/dia)1,Capacidade de processamento (mil m3/dia)1,Capacidade de processamento (mil m3/dia)1,Capacidade de processamento (mil m3/dia)1,Capacidade de processamento (mil m3/dia)1,Capacidade de processamento (mil m3/dia)1
Unnamed: 0_level_2,2010,2011,2012,2013,2014,2015,2016,2017,2018,2019
Total,76396,90396.0,90396.0,90396.0,96390.0,95350.0,95650.0,95650.0,95650.0,107210.0
Total,76396,,,,,,,,,
Urucu,9706,9706.0,9706.0,9706.0,12200.0,12200.0,12200.0,12200.0,12200.0,12200.0
Lubnor,350,350.0,350.0,350.0,350.0,350.0,350.0,350.0,350.0,350.0
Guamaré,5700,5700.0,5700.0,5700.0,5700.0,5700.0,5700.0,5700.0,5700.0,5700.0


Now that we have the MultiIndexed configuration ready, we should drop all unnecessary rows and columns. 

In this specific table, the first two levels of column's labels can be dropped to improve its readability. But they are useful information that may need later on. So it is a good idea to keep those informations (title and unit). We also will need to translate some information to English, using `Translator()` class from `googletrans` package.

#### Saving and Translating Table's Title and Unit

In [8]:
translator = Translator()

In [9]:
title = translator.translate(df.columns[0][0]).text
title

'Table 2.30 - Evolution of natural gas processing capacity, according to producer poles - 2010-2019'

In [10]:
unit = translator.translate(df.columns[1][1]).text
unit

'Processing capacity (thousand m3 / day) 1'

#### Dropping Unnecessary Rows and Columns

Lets drop the two first MultiIndex level from columns.

##### First Two Levels From Column MultiIndex

In [11]:
df.columns = df.columns.droplevel(0)
df.columns = df.columns.droplevel(0)

In [12]:
df.head()

Unnamed: 0,2010,2011,2012,2013,2014,2015,2016,2017,2018,2019
Total,76396,90396.0,90396.0,90396.0,96390.0,95350.0,95650.0,95650.0,95650.0,107210.0
Total,76396,,,,,,,,,
Urucu,9706,9706.0,9706.0,9706.0,12200.0,12200.0,12200.0,12200.0,12200.0,12200.0
Lubnor,350,350.0,350.0,350.0,350.0,350.0,350.0,350.0,350.0,350.0
Guamaré,5700,5700.0,5700.0,5700.0,5700.0,5700.0,5700.0,5700.0,5700.0,5700.0


##### NaN Rows and Columns

Now lets get rid off all NaN rows and columns.

In [13]:
df.dropna(how = 'all', inplace = True)
df.dropna(axis=1, how = 'all', inplace = True)
df

Unnamed: 0,2010,2011,2012,2013,2014,2015,2016,2017,2018,2019
Total,76396,90396.0,90396.0,90396.0,96390.0,95350.0,95650.0,95650.0,95650.0,107210.0
Urucu,9706,9706.0,9706.0,9706.0,12200.0,12200.0,12200.0,12200.0,12200.0,12200.0
Lubnor,350,350.0,350.0,350.0,350.0,350.0,350.0,350.0,350.0,350.0
Guamaré,5700,5700.0,5700.0,5700.0,5700.0,5700.0,5700.0,5700.0,5700.0,5700.0
Pilar,1800,1800.0,1800.0,1800.0,1800.0,1800.0,1800.0,1800.0,1800.0,1800.0
Atalaia,3000,3000.0,3000.0,3000.0,3000.0,3000.0,3000.0,3000.0,3000.0,3000.0
Candeias,2900,2900.0,2900.0,2900.0,2900.0,2900.0,2900.0,2900.0,2900.0,2900.0
Santiago²,4400,4400.0,4400.0,4400.0,1900.0,1900.0,2000.0,2000.0,2000.0,2000.0
Estação Vandemir Ferreira,6000,6000.0,6000.0,6000.0,6000.0,6000.0,6000.0,6000.0,6000.0,6000.0
Cacimbas,16000,16000.0,16000.0,16000.0,16000.0,16000.0,16000.0,16000.0,16000.0,18100.0


##### Ratio Column

The last column is the ratio of gas production between 2018 and 2019. We can get this data with basic coding, so lets drop it.

In [14]:
if (df.iloc[:,-1].replace('..', 0).round().all() == (((df.iloc[:,-2]-df.iloc[:,-3])/df.iloc[:,-3])*100).round().all()):
    df = df.drop(df.columns[-1], axis=1)
    
df

Unnamed: 0,2010,2011,2012,2013,2014,2015,2016,2017,2018,2019
Total,76396,90396.0,90396.0,90396.0,96390.0,95350.0,95650.0,95650.0,95650.0,107210.0
Urucu,9706,9706.0,9706.0,9706.0,12200.0,12200.0,12200.0,12200.0,12200.0,12200.0
Lubnor,350,350.0,350.0,350.0,350.0,350.0,350.0,350.0,350.0,350.0
Guamaré,5700,5700.0,5700.0,5700.0,5700.0,5700.0,5700.0,5700.0,5700.0,5700.0
Pilar,1800,1800.0,1800.0,1800.0,1800.0,1800.0,1800.0,1800.0,1800.0,1800.0
Atalaia,3000,3000.0,3000.0,3000.0,3000.0,3000.0,3000.0,3000.0,3000.0,3000.0
Candeias,2900,2900.0,2900.0,2900.0,2900.0,2900.0,2900.0,2900.0,2900.0,2900.0
Santiago²,4400,4400.0,4400.0,4400.0,1900.0,1900.0,2000.0,2000.0,2000.0,2000.0
Estação Vandemir Ferreira,6000,6000.0,6000.0,6000.0,6000.0,6000.0,6000.0,6000.0,6000.0,6000.0
Cacimbas,16000,16000.0,16000.0,16000.0,16000.0,16000.0,16000.0,16000.0,16000.0,18100.0


#### Correcting Index Labels

Now we need to correct the index's labels. 

As you can see, some of table are in Portuguese so we need to be able to translate from Portuguese to English, if is necessary.

##### Translating Index Labels

In [15]:
if df.index.nlevels > 1:
    
    for i, num in enumerate(df.index):
        for j in range(df.index.nlevels):
            if j==0:
                if (df.index[i][j] == 'Espírito_Santo') or (df.index[i][j] == 'Espirito_Santo'):
                    df.index = df.index.set_levels(df.index.levels[j].str.replace('Espírito_Santo','Espirito_Santo'), level = j)
                elif df.index[i][j] == 'Amazonas':
                    df.index = df.index.set_levels(df.index.levels[j].str.replace('Amazonas','Amazonas'), level = j)
                elif df.index[i][j] == 'Alagoas':
                    df.index = df.index.set_levels(df.index.levels[j].str.replace('Alagoas','Alagoas'), level = j)
                elif (df.index[i][j] == 'Ceará') or (df.index[i][j] == 'Ceara'):
                    df.index = df.index.set_levels(df.index.levels[j].str.replace('Ceará','Ceara'), level = j)
                elif (df.index[i][j] == 'Rio Grande do Norte') or (df.index[i][j] == 'Rio_Grande_do_Norte'):
                    df.index = df.index.set_levels(df.index.levels[j].str.replace(' ','_'), level = j)
                else:
                    df.index = df.index.set_levels(df.index.levels[j].str.replace(df.index[i][j], translator.translate(df.index[i][j]).text), level = j)
            if j==1:
                if df.index[i][j] == 'Mar': # checks if one of the words that the translate package can not translate
                    df.index = df.index.set_levels(df.index.levels[j].str.replace('Mar','Offshore'), level = j)
                elif df.index[i][j] == 'Terra': # checks if one of the words that the translate package can not translate
                    df.index = df.index.set_levels(df.index.levels[j].str.replace('Terra','Onshore'), level = j)
                elif not isinstance(df.index[i][j], str):
                    pass
                else:
                    df.index = df.index.set_levels(df.index.levels[j].str.replace(df.index[i][j], translator.translate(df.index[i][j]).text), level = j)
                    
elif df.index.nlevels == 1:
    new_index = []
    for index in df.index:
        if index == 'Reinjeção':
            new_index.append('Reinjection')
        else:
            new_index.append(translator.translate(index).text)
    df.index = new_index

In [16]:
if df.index.nlevels == 1:
    new_index = []
    for index in df.index:
        if index == 'Reinjeção':
            new_index.append('Reinjection')
        else:
            new_index.append(translator.translate(index).text)
    df.index = new_index

In [17]:
df

Unnamed: 0,2010,2011,2012,2013,2014,2015,2016,2017,2018,2019
Total,76396,90396.0,90396.0,90396.0,96390.0,95350.0,95650.0,95650.0,95650.0,107210.0
Urucu,9706,9706.0,9706.0,9706.0,12200.0,12200.0,12200.0,12200.0,12200.0,12200.0
Lubnor,350,350.0,350.0,350.0,350.0,350.0,350.0,350.0,350.0,350.0
Guamaré,5700,5700.0,5700.0,5700.0,5700.0,5700.0,5700.0,5700.0,5700.0,5700.0
Pilar,1800,1800.0,1800.0,1800.0,1800.0,1800.0,1800.0,1800.0,1800.0,1800.0
Atalaia,3000,3000.0,3000.0,3000.0,3000.0,3000.0,3000.0,3000.0,3000.0,3000.0
Candles,2900,2900.0,2900.0,2900.0,2900.0,2900.0,2900.0,2900.0,2900.0,2900.0
Santiago²,4400,4400.0,4400.0,4400.0,1900.0,1900.0,2000.0,2000.0,2000.0,2000.0
Vandemir Ferreira Station,6000,6000.0,6000.0,6000.0,6000.0,6000.0,6000.0,6000.0,6000.0,6000.0
Cacimbas,16000,16000.0,16000.0,16000.0,16000.0,16000.0,16000.0,16000.0,16000.0,18100.0


##### MyDataFrame Class

In order to capture and simplify access to some important informations *(e.g. title, unit)* about the tables collected, and to assembly all functions as methods in a the same place, a class was created. `MyDataFrame` class also perfomes some changes into the original tables to improve their readability and to translate some terms.

Parameters:

>`df`: is a csv file readed by pd.read_csv function.

>`translate`: is a boolean that calls the translate method.

>`translate_first_level`: is a boolean that says if the first level of a MultiIndex DataFrame should be translated or not, since some of them are proper noun and should not be translated.

>`white_space`: is a boolean that replace white space for underscore in all indexes, in case of using loc function.

>`drop_level`: is a boolean that calls the drop_levels method and drops column levels from a MultiIndex DataFrame until becames a Index DataFrame.


        self.translator = Translator()
        self.title = ''
        self.unit = ''
        self.footer = ''
        self.translate = translate
        self.translate_first_level = translate_first_level
        self.white_space= white_space
        self.drop_level = drop_level


Atributes:

>`df`: is the DataFrame it self.

>`translator`: a Translator() instancied object 

>`title`: is the DataFrame's title.

>`unit`: is the DataFrame's units.

>`footer`: is the DataFrame's source and notes.

> `self.translate`: is a boolean that allows translation methods.

> `self.translate_first_level`: is a boolean that call the translate_first_level method.

Methods:

>`drop_na()`: drops all rows and columns that have all values equals to NaN.

> `drop_levels()`:  drops all levels that categorize the table itself, and not its values individually. However those informations are allocated as table's title and unit for later use.

>`index_translate_index()`: translate a index DataFrame to English.

>`index_translate_multi_index()`: translate a MultiIndex DataFrame to English.

>`drop_last_column()`: drop the last column if it has been configured as an unnamed column. The value of this column is a ratio that is redundant to our project.

Translations from Portuguese to English will also be performed within the class using `googletrans` package. The words that are not supported by the package will be translated directly using a dictionary.

In [1]:
class MyDataFrame: 
    def __init__(self, df, translate=False, translate_first_level=False, drop_level=True):
        
        self.df = df
        self.translate = translate
        self.translate_first_level = translate_first_level
        self.drop_level = drop_level
        self.translator = Translator()
        self.title = ''
        self.unit = ''
        self.footer = ''
              

        if self.df.index.nlevels > 1:
            self.title_unit_multiindex();
        
        if self.drop_level:
            self.drop_levels()

      
        if self.translate:
            if self.df.index.nlevels == 1:
                self.translate_index()
                
            
        if self.translate:
            if self.df.index.nlevels > 1:
                self.translate_multi_index()
        
        
        if self.white_space:
            self.replace_white_space()


        self.drop_na()
            
            
    def title_unit_multiindex(self):
        self.title = self.df.columns[0][0]
        self.unit = self.df.columns[1][1]
            
    def drop_levels(self):
        """
        Drops two column levels that contained the infos previously captured (table's title and unit)
        """     
        while self.df.columns.nlevels>1:
            self.df.columns = self.df.columns.droplevel(0)
            
            
    def drop_na(self):
        """
        Drops all rows and columns that have all values equals to NaN.
        """  
        self.df.dropna(how = 'all', inplace = True)
        self.df.dropna(axis = 'columns', how = 'all', inplace = True)

    
    def translate_index(self):
        """
        Translates the index of a DataFrame to English.
        """
        self.new_index = []
        for index in self.df.index:
            if index == 'Reinjeção':
                self.new_index.append('Reinjection')
            elif (index == 'Espírito Santo') or (index == 'Espirito_Santo'):
                self.new_index.append('Espirito_Santo')
            elif index == 'Amazonas':
                self.new_index.append('Amazonas')
            elif index == 'Alagoas':
                self.new_index.append('Alagoas')
            elif (index == 'Ceará') or (index == 'Ceara'):
                self.new_index.append('Ceara')
            elif (index == 'Rio Grande do Norte') or (index == 'Rio_Grande_do_Norte'):
                self.new_index.append('Rio_Grande_do_Norte')
            else:
                self.new_index.append(self.translator.translate(index).text)
        self.df.index = self.new_index
        
                        
    def translate_multi_index(self):
        """
        Translates a MultiIndex DataFrame to English.
        """
        
        if self.translate_first_level == True:
            for i, num in enumerate(self.df.index):
                    for j in range(self.df.index.nlevels):       
                        if j==0:
                            if (self.df.index[i][j] == 'Espírito_Santo') or (self.df.index[i][j] == 'Espirito_Santo'):
                                self.df.index = self.df.index.set_levels(self.df.index.levels[j].str.replace('Espírito_Santo','Espirito_Santo'), level = j)
                            elif self.df.index[i][j] == 'Amazonas':
                                self.df.index = self.df.index.set_levels(self.df.index.levels[j].str.replace('Amazonas','Amazonas'), level = j)
                            elif self.df.index[i][j] == 'Alagoas':
                                self.df.index = self.df.index.set_levels(self.df.index.levels[j].str.replace('Alagoas','Alagoas'), level = j)
                            elif (self.df.index[i][j] == 'Ceará') or (self.df.index[i][j] == 'Ceara'):
                                self.df.index = self.df.index.set_levels(self.df.index.levels[j].str.replace('Ceará','Ceara'), level = j)
                            elif (self.df.index[i][j] == 'Rio Grande do Norte') or (self.df.index[i][j] == 'Rio_Grande_do_Norte'):
                                self.df.index = self.df.index.set_levels(self.df.index.levels[j].str.replace(' ','_'), level = j)
                            else:
                                self.df.index = self.df.index.set_levels(self.df.index.levels[j].str.replace(self.df.index[i][j], self.translator.translate(self.df.index[i][j]).text), level = j)
        
        for i, num in enumerate(self.df.index):
            for j in range(self.df.index.nlevels):
                if j==0:
                    pass
                if j==1:
                    if self.df.index[i][j] == 'Mar': # checks if one of the words that the translate package can not translate
                        self.df.index = self.df.index.set_levels(self.df.index.levels[j].str.replace('Mar','Offshore'), level = j)
                    elif self.df.index[i][j] == 'Terra': # checks if one of the words that the translate package can not translate
                        self.df.index = self.df.index.set_levels(self.df.index.levels[j].str.replace('Terra','Onshore'), level = j)
                    elif not isinstance(self.df.index[i][j], str):
                        pass
                    else:
                        self.df.index = self.df.index.set_levels(self.df.index.levels[j].str.replace(self.df.index[i][j], self.translator.translate(self.df.index[i][j]).text), level = j)
        
        
    def replace_underscore(self):
        """
        Replaces all underscore for white space.
        """
        if self.df.index.nlevels > 1: # tells how many level are
            for i, level in enumerate(range(self.df.index.nlevels)): # runs through levels
                #for j, value in enumerate(self.df.index.levels[i]): # runs through the level's value and replace white space for underline
                self.df.index = self.df.index.set_levels(self.df.index.levels[i].str.replace("_", " "), level = i)
        
        elif self.df.index.nlevels == 1:
            self.new_index = []
            for index in self.df.index:
                self.new_index.append(index.replace('_', ' '))
            self.df.index = self.new_index
            
            
    def drop_unnamed_column(self):
        """
        Drops the last column if its name starts with 'Unnamed'.
        """            
        for i,name in enumerate(self.df.columns):
            if type(name) == str and name.startswith('Unnamed'):
                self.df = self.df.drop(self.df.columns[-1], axis=1)

    def index_sups(self):
        """
        Fix all index that has number as supscript.
        """
        if self.df.index.nlevels > 1:
            for name in self.df.index.levels[0]:

                self.df.index = self.df.index.set_levels(self.df.index.levels[0].str.replace('1','¹'), level = 0)
                self.df.index = self.df.index.set_levels(self.df.index.levels[0].str.replace('2','²'), level = 0)
                self.df.index = self.df.index.set_levels(self.df.index.levels[0].str.replace('3','³'), level = 0)
                self.df.index = self.df.index.set_levels(self.df.index.levels[0].str.replace('4','⁴'), level = 0)
                self.df.index = self.df.index.set_levels(self.df.index.levels[0].str.replace('5','⁵'), level = 0)
                self.df.index = self.df.index.set_levels(self.df.index.levels[0].str.replace('6','⁶'), level = 0)
                self.df.index = self.df.index.set_levels(self.df.index.levels[0].str.replace('7','⁷'), level = 0)
                self.df.index = self.df.index.set_levels(self.df.index.levels[0].str.replace('8','⁸'), level = 0)
                self.df.index = self.df.index.set_levels(self.df.index.levels[0].str.replace('9','⁹'), level = 0)

        if self.df.index.nlevels == 1:
            self.df.index = self.df.index.str.replace('1','¹')
            self.df.index = self.df.index.str.replace('2','²')
            self.df.index = self.df.index.str.replace('3','³')
            self.df.index = self.df.index.str.replace('4','⁴')
            self.df.index = self.df.index.str.replace('5','⁵')
            self.df.index = self.df.index.str.replace('6','⁶')
            self.df.index = self.df.index.str.replace('7','⁷')
            self.df.index = self.df.index.str.replace('8','⁸')
            self.df.index = self.df.index.str.replace('9','⁹')


## About the data

The data were collected from:

* [Petroleum National Agency Statistical Yearbook 2020](http://www.anp.gov.br/publicacoes/anuario-estatistico/anuario-estatistico-2020), ANP *(Portuguese: Agência Nacional de Petróleo)*, consolidates data on the performance of the Brazilian oil, natural gas and biofuels industry and the national supply system in 2010-2019;

* [ANEEL](https://www.aneel.gov.br/dados/geracao) (Portuguese: Agência Nacional de Energia Elétrica) Generation by Source: 
History of the volume of electricity produced in the country in GWh, expressed by the values of energy load dispatched in the National Interconnected System - SIN, classified by renewable sources or not and the volume produced by generators not yet interconnected.

* [The World Bank](https://data.worldbank.org/country/brazil)
* [Ministério de Minas e Energia] (http://www.mme.gov.br/documents/36216/1119340/06+-+Boletim+Mensal+de+Acompanhamento+da+Ind%C3%BAstria+de+G%C3%A1s+Natural+Junho+2020/4ecd27ca-bd64-bfa7-3510-03799045f87f) Demand by Segment: Monthly Industry Follow-up Bulletin of Natural Gas.

* [NASA Giovanni] (https://giovanni.gsfc.nasa.gov/giovanni/) Precipitation Data. Instructions after log in on the website:


> 1. `Select Plot`: Time Series, Seasonal

> 2. `Select Seasonal Dates`: Select all months, years 2010 to 2018

> 3. `Select Region`: Countries Brazil;

> 4. `Keyword`: Precipitation

> 5. `Variable`: Preciptation Rate (TRMM_#B43 v7)

> 6. `Units`: mm/month

> After making the selection above, click on `Plot Data`

* [EIA](https://www.eia.gov/environment/emissions/co2_vol_mass.php) Carbon Dioxide Emissions Coefficients

