# Creating a dataframe

# Document

<table align="left">
    <tr>
        <th class="text-align:left">Title</th>
        <td class="text-align:left">Creating a dataframe</td>
    </tr>
    <tr>
        <th class="text-align:left">Last modified</th>
        <td class="text-align:left">2019-03-08</td>
    </tr>
    <tr>
        <th class="text-align:left">Author</th>
        <td class="text-align:left">Gilles Pilon <gillespilon13@gmail.com></td>
    </tr>
    <tr>
        <th class="text-align:left">Status</th>
        <td class="text-align:left">Active</td>
    </tr>
    <tr>
        <th class="text-align:left">Type</th>
        <td class="text-align:left">Jupyter notebook</td>
    </tr>
    <tr>
        <th class="text-align:left">Created</th>
        <td class="text-align:left">2019-03-08</td>
    </tr>
    <tr>
        <th class="text-align:left">File name</th>
        <td class="text-align:left">dataframe_create.ipynb</td>
    </tr>
    <tr>
        <th class="text-align:left">Other files required</th>
        <td class="text-align:left"></td>
    </tr>
</table>

# Introduction

- Read a csv file to a dataframe
- Read an xlsx file to a dataframe

## Read a csv file to a dataframe

In [None]:
# Import the pandas package of data structures and data analysis tools.
import pandas as pd
from sklearn import datasets

In [None]:
# Create the dataframe.
# The dependent variable is Stock_Index_Price.
# The independent variables are Interest_Rate and
# Unemployment_Rate.
stock_market = {'Year': [2017,2017,2017,2017,2017,2017,2017,
                         2017,2017,2017,2017,2017,2016,2016,
                         2016,2016,2016,2016,2016,2016,2016,
                         2016,2016,2016],
                'Month': [12, 11,10,9,8,7,6,5,4,3,2,1,12,11,
                          10,9,8,7,6,5,4,3,2,1],
                'Interest_Rate': [2.75,2.5,2.5,2.5,2.5,2.5,
                                  2.5,2.25,2.25,2.25,2,2,2,
                                  1.75,1.75,1.75,1.75,1.75,
                                  1.75,1.75,1.75,1.75,1.75,1.75],
                'Unemployment_Rate': [5.3,5.3,5.3,5.3,5.4,5.6,
                                      5.5,5.5,5.5,5.6,5.7,5.9,
                                      6,5.9,5.8,6.1,6.2,6.1,
                                      6.1,6.1,5.9,6.2,6.2,6.1],
                'Stock_Index_Price': [1464,1394,1357,1293,1256,
                                      1254,1234,1195,1159,1167,
                                      1130,1075,1047,965,943,
                                      958,971,949,884,866,876,
                                      822,704,719]        
                }
df = pd.DataFrame(stock_market,columns=['Year','Month',
                                        'Interest_Rate',
                                        'Unemployment_Rate',
                                        'Stock_Index_Price'])

In [None]:
# df is a variable in which we store the csv file as a dataframe.
# A dataframe is a two-dimensional labeled data structure with columns
# of potentially different types.
# Think of it as a worksheet in an Excel workbook.
df.to_csv('stock_market.csv')
df.head()

## Importing datasets from scikit-learn

The data file is not a dataframe.

In [None]:
data = datasets.load_boston()
print(data.DESCR)

In [None]:
data.feature_names

In [None]:
data.target

In [None]:
# Load the data into a dataframe.
features = pd.DataFrame(data.data, columns=data.feature_names)
features.head()

In [None]:
target = pd.DataFrame(data.target, columns=['MDEV'])
target.head()

In [None]:
features.to_csv('boston_features.csv')
target.to_csv('boston_target.csv')

# References

- [pandas to csv](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_csv.html)

- [pandas DataFrame](https://pandas.pydata.org/pandas-docs/stable/api.html#dataframe)