# 1. Comma Separated Files

**Reading CSV file with Pandas**

* **sep** - this defaults to a comma, but we can specify anything we want. For example, CSV format is poor if some of your columns contain commas. A better option might be a |.

* **header** - which row (if any) have the column names.

* **names** - column names to use.

In [2]:
import pandas as pd

names = ['age', 'workclass', 'fnlwgt', 'education', 'educationnum', 'maritalstatus', 'occupation', 'relationship', 'race',
        'sex', 'capitalgain', 'capitalloss', 'hoursperweek', 'nativecountry', 'label']

df = pd.read_csv("./data/adult.data", header=None, names=names)
                      
print(df.head)

<bound method NDFrame.head of        age          workclass  fnlwgt    education  educationnum  \
0       39          State-gov   77516    Bachelors            13   
1       50   Self-emp-not-inc   83311    Bachelors            13   
2       38            Private  215646      HS-grad             9   
3       53            Private  234721         11th             7   
4       28            Private  338409    Bachelors            13   
...    ...                ...     ...          ...           ...   
32556   27            Private  257302   Assoc-acdm            12   
32557   40            Private  154374      HS-grad             9   
32558   58            Private  151910      HS-grad             9   
32559   22            Private  201490      HS-grad             9   
32560   52       Self-emp-inc  287927      HS-grad             9   

             maritalstatus          occupation    relationship    race  \
0            Never-married        Adm-clerical   Not-in-family   White   
1    

# 2. JSON Files

## Introduction to JSON file

**JSON (JavaScript Object Notation) is a popular format allowing for a more flexible schema. It is also easy for humans to read and write. A lot of the data sent around the web is transmitted as JSON. Here is an example:**

```json
{
    "glossary": {
        "title": "example glossary",
        "GlossDiv": {
            "title": "S",
            "GlossList": {
                "GlossEntry": {
                    "ID": "SGML",
                    "SortAs": "SGML",
                    "GlossTerm": "Standard Generalized Markup Language",
                    "Acronym": "SGML",
                    "Abbrev": "ISO 8879:1986",
                    "GlossDef": {
                        "para": "A meta-markup language, used to create markup languages such as DocBook.",
                        "GlossSeeAlso": ["GML", "XML"]
                    },
                    "GlossSee": "markup"
                }
            }
        }
    }
}
```

## Reading JSON file with Python

In [3]:
import json

## Define the JSON object as a string
json_string = """{
    "glossary": {
        "title": "example glossary",
        "GlossDiv": {
            "title": "S",
            "GlossList": {
                "GlossEntry": {
                    "ID": "SGML",
                    "SortAs": "SGML",
                    "GlossTerm": "Standard Generalized Markup Language",
                    "Acronym": "SGML",
                    "Abbrev": "ISO 8879:1986",
                    "GlossDef": {
                        "para": "A meta-markup language, used to create markup languages such as DocBook.",
                        "GlossSeeAlso": ["GML", "XML"]
                    },
                    "GlossSee": "markup"
                }
            }
        }
    }
}"""


# Read the JSON data into Python
json_data = json.loads(json_string)

print(json_data)

{'glossary': {'title': 'example glossary', 'GlossDiv': {'title': 'S', 'GlossList': {'GlossEntry': {'ID': 'SGML', 'SortAs': 'SGML', 'GlossTerm': 'Standard Generalized Markup Language', 'Acronym': 'SGML', 'Abbrev': 'ISO 8879:1986', 'GlossDef': {'para': 'A meta-markup language, used to create markup languages such as DocBook.', 'GlossSeeAlso': ['GML', 'XML']}, 'GlossSee': 'markup'}}}}}


# 3. Raw Files

## Introduction to raw files

Sometimes you get data in strange formats and you have to roll your own Python code to process the data. Fortunately, doing this is simple

For this, we will assume that you have data in some type of text file. Each row of data corresponds to a row in your text file.

```raw
James|22|M
Sarah|31|F
Mindy|25|F
```

In [5]:
import tempfile

tmp = tempfile.NamedTemporaryFile()

# Open the file for writing. And write the data.
with open(tmp.name, 'w') as f:
    f.write("James|22|M\n")
    f.write("Sarah|31|F\n")
    f.write("Mindy|25|F")

# Read in the data from our file, line by line
with open(tmp.name, "r") as f:
    for line in f:
      print(line)

James|22|M

Sarah|31|F

Mindy|25|F


In [6]:
import tempfile

tmp = tempfile.NamedTemporaryFile()

# Open the file for writing and write our data
with open(tmp.name, 'w') as f:
    f.write("James|22|M\n")
    f.write("Sarah|31|F\n")
    f.write("Mindy|25|F")

first_values = []  # Define a list to store the first values of each row
with open(tmp.name, "r") as f:  # Open the file to read
    for line in f:  # Loop over each line
      row_values = line.split("|")  # Split each line by the | character into a list
      first_values.append(row_values[0])  # Add the first value to our list
      
print(first_values)

['James', 'Sarah', 'Mindy']


# 4. Exercise: Reading Auto MPG Dataset

**As an exercise, read Auto MPG dataset. In this documentation, you will find the Attribute Information which lists the column names. Fill whitespace with _ in columns' names**

In [10]:
import pandas as pd

def read_csv():
    
    names = ['mpg', 'cylinders', 'displacement', 'horsepower', 'weight', 'acceleration', 'model_year', 'origin', 'car_name']
    df = pd.read_csv('./data/auto-mpg.data', header=None, names=names, delim_whitespace=True)
    return df

print(read_csv())

      mpg  cylinders  displacement horsepower  weight  acceleration  \
0    18.0          8         307.0      130.0  3504.0          12.0   
1    15.0          8         350.0      165.0  3693.0          11.5   
2    18.0          8         318.0      150.0  3436.0          11.0   
3    16.0          8         304.0      150.0  3433.0          12.0   
4    17.0          8         302.0      140.0  3449.0          10.5   
..    ...        ...           ...        ...     ...           ...   
393  27.0          4         140.0      86.00  2790.0          15.6   
394  44.0          4          97.0      52.00  2130.0          24.6   
395  32.0          4         135.0      84.00  2295.0          11.6   
396  28.0          4         120.0      79.00  2625.0          18.6   
397  31.0          4         119.0      82.00  2720.0          19.4   

     model_year  origin                   car_name  
0            70       1  chevrolet chevelle malibu  
1            70       1          buick sk