# Data Analysis in Base Python

## Loading and Editing Data Files
### * Types of Data Files
* CSV Files (comma-separated value)
* JSON Files (JavaScript object notation)

### * Adding ! before code line lets you run unix commands

In [1]:
! ls

Advertising.csv        week1day1.ipynb        week1day3.ipynb
Pre-Work Review .ipynb week1day2.ipynb


### * Opening and Reading text file

In [9]:
file_path = 'zen_of_python.txt'
file_obj = open(file_path)
file_contents = file_obj.readlines()
for line in file_contents:
    print(line.replace('\n',''))
### OR

# with open(file_path) as file_obj:
#     file_contents = file_obj.readlines()

The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!


### * Writing to file

In [8]:
'''Creates new file if existing name does not exist'''

output_file_obj = open('zen_of_python_2.txt', mode='w')

In [10]:
for line in file_contents:
    output_file_obj.write(line)

* Need to close file after finishing to release RAM and that data is written to disc

In [11]:
file_obj.close()
output_file_obj.close()

* Other options for reading file
    * <pre><code>.read()</code></pre> will read the entire contents into a single string
    * <code><pre> .read(size)</code></pre> will read up to size number of characters into a single string
    * <code><pre>.readline()</code></pre> will read just one line of the file into a string

## Loading and editing CSV files

In [2]:
import csv

with open("advertising.csv") as f:
    # Pass the file in to a "reader" object and specify that
    # values without explicit quotes (i.e. all values in this
    # dataset) should be treated as numbers
    reader = csv.reader(f, quoting=csv.QUOTE_NONNUMERIC)
    # Get all of the data from the reader using `list`
    advertising_with_csv_module = list(reader)
    
advertising_with_csv_module

[['', 'TV', 'Radio', 'Newspaper', 'Sales'],
 ['1', 230.1, 37.8, 69.2, 22.1],
 ['2', 44.5, 39.3, 45.1, 10.4],
 ['3', 17.2, 45.9, 69.3, 9.3],
 ['4', 151.5, 41.3, 58.5, 18.5],
 ['5', 180.8, 10.8, 58.4, 12.9],
 ['6', 8.7, 48.9, 75.0, 7.2],
 ['7', 57.5, 32.8, 23.5, 11.8],
 ['8', 120.2, 19.6, 11.6, 13.2],
 ['9', 8.6, 2.1, 1.0, 4.8],
 ['10', 199.8, 2.6, 21.2, 10.6],
 ['11', 66.1, 5.8, 24.2, 8.6],
 ['12', 214.7, 24.0, 4.0, 17.4],
 ['13', 23.8, 35.1, 65.9, 9.2],
 ['14', 97.5, 7.6, 7.2, 9.7],
 ['15', 204.1, 32.9, 46.0, 19.0],
 ['16', 195.4, 47.7, 52.9, 22.4],
 ['17', 67.8, 36.6, 114.0, 12.5],
 ['18', 281.4, 39.6, 55.8, 24.4],
 ['19', 69.2, 20.5, 18.3, 11.3],
 ['20', 147.3, 23.9, 19.1, 14.6],
 ['21', 218.4, 27.7, 53.4, 18.0],
 ['22', 237.4, 5.1, 23.5, 12.5],
 ['23', 13.2, 15.9, 49.6, 5.6],
 ['24', 228.3, 16.9, 26.2, 15.5],
 ['25', 62.3, 12.6, 18.3, 9.7],
 ['26', 262.9, 3.5, 19.5, 12.0],
 ['27', 142.9, 29.3, 12.6, 15.0],
 ['28', 240.1, 16.7, 22.9, 15.9],
 ['29', 248.8, 27.1, 22.9, 18.9],
 ['30', 7

## Loading and Editing JSON Files

In [19]:
import json
import pandas as pd

with open('iris.json') as f:
    data = json.load(f)

df = pd.DataFrame(data)
df


Unnamed: 0,sepalLength,sepalWidth,petalLength,petalWidth,species
0,5.1,3.5,1.4,0.2,setosa
1,4.9,3.0,1.4,0.2,setosa
2,4.7,3.2,1.3,0.2,setosa
3,4.6,3.1,1.5,0.2,setosa
4,5.0,3.6,1.4,0.2,setosa
...,...,...,...,...,...
145,6.7,3.0,5.2,2.3,virginica
146,6.3,2.5,5.0,1.9,virginica
147,6.5,3.0,5.2,2.0,virginica
148,6.2,3.4,5.4,2.3,virginica


## Outputting to JSON

In [None]:
with open('doc to write to', 'w') as f:
    json.dump(data_to_dump, f)

_____________________________________________________________________________

# Comprehensions

## List Comprehensions

In [3]:
'''pulls odd numbers from set into new list'''
nums = set(range(1000))
odds = [num for num in nums if num % 2 == 1]

In [4]:
'''pulls first letter from each element'''
words = ['carbon', 'osmium', 'mercury', 'potassium', 'rhenium', 'einsteinium',
        'hydrogen', 'erbium', 'nitrogen', 'sulfur', 'iodine', 'oxygen', 'niobium']

first_c = [c[0] for c in words]
first_c

['c', 'o', 'm', 'p', 'r', 'e', 'h', 'e', 'n', 's', 'i', 'o', 'n']

## Dictionary Comprehensions

In [5]:
{k: v for k, v in zip(range(5), range(0, 10, 2))}

{0: 0, 1: 2, 2: 4, 3: 6, 4: 8}

In [6]:
scores = [.858, .873, .868]
{'model' + str(j+1): scores[j] for j in range(3)}

{'model1': 0.858, 'model2': 0.873, 'model3': 0.868}

## Nested Structures

In [7]:
phone_nos = [{'name': 'greg', 'nums': {'home': 1234567, 'work': 7654321}},
          {'name': 'max', 'nums': {'home': 9876543, 'work': 1010001}},
            {'name': 'erin', 'nums': {'home': 3333333, 'work': 4444444}},
            {'name': 'joél', 'nums': {'home': 2222222, 'work': 5555555}},
            {'name': 'sean', 'nums': {'home': 9999999, 'work': 8888888}}]

In [12]:

name_list = [dict_['name'] for dict_ in phone_nos]
num_list = [dict_['nums']['home'] for dict_ in phone_nos]
fin_dict = [{k : v} for k,v in zip(name_list,num_list)]
fin_dict


[{'greg': 1234567},
 {'max': 9876543},
 {'erin': 3333333},
 {'joél': 2222222},
 {'sean': 9999999}]