<font size="6">Importing Data using the `common.py` file</font>

This notebook demonstrate how to import data using the `common.py` module provided in the book **[Practical Statistics for Data Scientists](https://nbviewer.org/github/stevenkhwun/Online-resources/blob/main/Books_Handbooks_Notes/Statistics/PracticalStatisticsforDataScientists.pdf)** by _Peter Bruce, Andrew Bruce, and Peter Gedeck_.

# The `common.py` module

The contents of the `common.py` module is provided below. The file `common.py` should be located at the same directory of this notebook.

```Python
# common.py

from pathlib import Path

def dataDirectory(dataDirectoryName='psds_data'):
    """
    Return the directory that contains the data.
    
    We assume that the data folder is locate in a parent directory of this file and named 'psds_data'.
    If your setup is different, you will need to change this method.
    """
    dataDir = Path(__file__).resolve().parent
    while not list(dataDir.rglob('psds_data')):
        dataDir = dataDir.parent
    found = [d for d in dataDir.rglob('psds_data') if d.is_dir()]
    if not found:
        raise Exception(f'Cannot find data directory with name {dataDirectoryName} along the path of your source files')
    return found[0]
```

# Get the path of the data folder

Run the following code to get the path of the data folder `psds_data`, which contains the datasets. The folder `psds_data` is not necessary in the sub-directory of the folder of this notebook. However, it may be the best practice to place the data folder in its sub-directory. In this demonstration, the data folder is located in the sub-directory of the folder of this notebook.

In [1]:
try:
    import common
    DATA = common.dataDirectory()
except ImportError:
    DATA = Path().resolve() / 'psds_data'

The path of the data folder is given by:

In [2]:
DATA

WindowsPath('C:/Users/steve/Notebooks/psds_data')

You can check the folder containing this notebook by the following code.

In [3]:
import os
os.getcwd()

'C:\\Users\\steve\\Notebooks'

# Importing the data

Now, we import the dataset `state.csv`, which is located in the data folder `psds_data`.

First, we assign the path of the dataset to an variable `STATE_CSV`.

In [4]:
STATE_CSV = DATA / 'state.csv'

In [5]:
STATE_CSV

WindowsPath('C:/Users/steve/Notebooks/psds_data/state.csv')

Thereafter, we read the data and convert it to a pandas dataframe.

In [6]:
import pandas as pd
state = pd.read_csv(STATE_CSV)
state.head()

Unnamed: 0,State,Population,Murder.Rate,Abbreviation
0,Alabama,4779736,5.7,AL
1,Alaska,710231,5.6,AK
2,Arizona,6392017,4.7,AZ
3,Arkansas,2915918,5.6,AR
4,California,37253956,4.4,CA
