# Demos 1 : Reading and Writing files in Python

### working with data need libraries

install `Faker` package

`conda install -c conda-forge faker`

or

`pip install Faker`

### Writing and reading CSVs

Writing CSVs using the Python CSV Library. Create 1,000 records data and save to CSV format

In [1]:
from faker import Faker

In [2]:
from faker import Faker
import csv

output=open('data-generator-faker.csv','w')

fake=Faker()
header=['name','age','street','city','state','zip','lng','lat']
mywriter=csv.writer(output)
mywriter.writerow(header)

for r in range(1000):
    mywriter.writerow([fake.name(),
                       fake.random_int(min=18, max=80, step=1), 
                       fake.street_address(), 
                       fake.city(),
                       fake.state(),
                       fake.zipcode(),
                       fake.longitude(),
                       fake.latitude()])
output.close()

Reading CSVs by python

In [3]:
import csv

with open('data-generator-faker.csv') as f:
    myreader=csv.DictReader(f)
    headers=next(myreader)
    for row in myreader:
        print(row['name'])

Randall Collins MD
Mary Ortiz
Charles Ellis
Renee Scott
Julie Hayes
Keith Jones
Richard Huff
Philip Perez
Mary Smith
Susan Dawson
Sandra Warren
Ashley Cooke
Julie Lane
Michael Lewis
Brian Guzman
Melinda Wilson
Beth Hill
Dr. Samantha Michael
Cameron Smith
Oscar Davis
Madeline Miller
Thomas Dennis
Wendy Pacheco
Crystal Ross
Michael Anderson
Bryan Ellis
Katherine Little
Robert Ellis
Amber Smith MD
Larry Gibson
Melissa Tanner
Anthony Black
James Medina
Sarah Adams
Stacy Brady
Allen Mack
Mark Valdez
Matthew Lopez
Mr. Daniel Anderson DDS
George Cabrera
Elizabeth Moore
Janet Duarte
Lisa Valencia
Francisco Stewart
Herbert Smith
Russell Rivers
Joseph Patton
Matthew Reyes MD
Hannah Dorsey
Matthew Gutierrez
Andrew Cowan
Gavin Richardson
Peter Burton
Monica Wong
Matthew Shaffer
Eric Wright
Marc Walker
Charles Page
Adam Peterson
Ashley Cox
Brent Diaz
Kristina Munoz
Julia Esparza
Angel King
Amber Mccullough
Michele Wagner
Linda Franklin
Philip Robinson
Katherine Nguyen DDS
Laurie Clark
Kendra Grimes

Pamela James
James Martin
Jessica Schmidt
Karen Vargas
David Rodriguez
Rebekah Sanchez
Melanie Wade
Hannah Vazquez
Sarah Sanders
Miranda Carpenter
Barbara Wallace
Peter Carrillo
Megan Gaines
Wendy Hamilton
Victor Weeks
Heather Hoover
Sierra Fernandez
Jessica Morris
Shawn Holmes
Cody Tran
Brian Klein
Anthony Burns
Kenneth Lewis
Amanda Huffman
Kendra Brown
Melody Brewer
Vanessa Mcfarland
Austin Martinez
William Flores
Miss Karen Anderson DVM
Elizabeth Thompson
Matthew Hudson
Mrs. Colleen Skinner
Steven Austin
John Price
Alex Heath
Matthew Butler
Sandra Hernandez
Samantha Williams
Brent Meyers
Joel Bailey
Clayton Sloan
Mary Terry
Michelle Andrews
Theodore Wu
Laura Evans
Barry Estrada
Ashley Reynolds
Karen Barnes MD
Donna Davis
Stephen Taylor
Melissa Nelson
Aaron Moore
Marissa Castillo
Jeffrey Harrison
Julia Anderson
Sue Rogers
Stacy Simpson
Andrew Lindsey
Reginald Rocha
Anthony Young
Joseph Wilson
Jennifer Garcia
Angela Lopez
Jonathan Martin
Wesley Tucker
Lindsay Reid
Mikayla Pratt
Chad R

### Reading and writing CSVs using pandas DataFrames

install `pandas` package

`pip install pandas`

In [4]:
import pandas as pd
df = pd.read_csv('data-generator-faker.csv')

In [5]:
df.head(10)

Unnamed: 0,name,age,street,city,state,zip,lng,lat
0,Matthew Ray,44,31550 Freeman Glens,Port Aaronhaven,Utah,77470,-179.025135,-62.190457
1,Randall Collins MD,35,82803 Alison Rest,Rachelborough,Illinois,10182,-44.609283,-28.337885
2,Mary Ortiz,34,27926 Maurice Centers,Amberbury,Minnesota,37504,-21.680833,76.561871
3,Charles Ellis,29,3564 Lee Knoll Apt. 099,East Charlesview,Kentucky,77490,-25.55056,10.615326
4,Renee Scott,23,5082 Hansen Keys Suite 360,North David,Alaska,51458,70.153317,86.222612
5,Julie Hayes,76,816 Davis Crossroad,Evansfort,Alaska,92183,-79.853413,-80.121696
6,Keith Jones,33,283 Benson Course,New Ethan,Georgia,83337,-145.779911,-26.89216
7,Richard Huff,63,0782 Nathan Springs Suite 302,Robertsmouth,Ohio,12781,102.664526,77.815565
8,Philip Perez,33,81917 Moody Union,Aaronmouth,Rhode Island,11679,-127.891762,88.536181
9,Mary Smith,55,071 Andrea Walk Suite 666,Brookeland,Oregon,39727,42.663508,0.654903


### Writing JSON with Python

Write JSON using Python and the standard library

In [8]:
from faker import Faker
import json

output=open('data-generator-faker.json','w')

fake=Faker()
alldata={}
alldata['records']=[]

for x in range(1000):
    data={"name":fake.name(),
          "age":fake.random_int(min=18, max=80, step=1),
          "street":fake.street_address(),
          "city":fake.city(),
          "state":fake.state(),
          "zip":fake.zipcode(),
          "lng":float(fake.longitude()),
          "lat":float(fake.latitude()),
          "code":fake.phone_number()}
    alldata['records'].append(data)
json.dump(alldata,output)

Reading JSON

In [9]:
import json
with open("data-generator-faker.json",'r') as f:
    data=json.load(f)
print(type(data))
print(data['records'][0]['name'])

JSONDecodeError: Expecting ',' delimiter: line 1 column 196728 (char 196727)

In [10]:
import pandas as pd
df = pd.read_json('data-generator-faker.json')

ValueError: Unexpected character found when decoding object value

## Working with Databases

### Inserting and extracting relational data in Python

1. Creating a PostgreSQL database and tables in pgadmin4 that consist of 

    a) name: text

    b) id: integer 

    c) street: text 

    d) city: text

    e) zip: text
    
    
2. Inserting data into PostgreSQL

    Installing psycopg2
    
    Connecting to PostgreSQL with Python
    
    Inserting data
    
    Inserting multiple records
    


In [1]:
import psycopg2 as db

  """)


In [3]:
conn_string="dbname='dataengineering' host='localhost' user='postgres' password='admin'"

conneting to PostgreSQL

In [None]:
import psycopg2 as db

conn_string="dbname='dataengineering' host='localhost' user='postgres' password='admin'"

conn=db.connect(conn_string)

cur=conn.cursor()


Inserting Data

In [4]:
query = "insert into users (id,name,street,city,zip) values({},'{}','{}','{}','{}')".format(1,'Big Bird','Sesame Street','Fakeville','12345')
print(cur.mogrify(query))

query2 = "insert into users (id,name,street,city,zip) values(%s,%s,%s,%s,%s)"
data=(1,'Big Bird','Sesame Street','Fakeville','12345')
print(cur.mogrify(query2,data))

cur.execute(query2,data)
conn.commit()

b"insert into users (id,name,street,city,zip) values(1,'Big Bird','Sesame Street','Fakeville','12345')"
b"insert into users (id,name,street,city,zip) values(1,'Big Bird','Sesame Street','Fakeville','12345')"


Inserting multiple records

In [7]:
import psycopg2 as db
from faker import Faker

fake=Faker()
data=[]
i=2

for r in range(1000):
    data.append((i,fake.name(),fake.street_address(), fake.city(),fake.zipcode()))
    i+=1

data_for_db=tuple(data)

print(data_for_db)

((2, 'Laura Howard', '53393 Gonzalez Court Apt. 585', 'East Lisaside', '14585'), (3, 'David Krause', '92793 Maldonado Shoals Apt. 318', 'Tinachester', '91254'), (4, 'Patty Lopez', '2918 Justin Fords', 'Port Annaview', '60563'), (5, 'Jonathan Kirk', '08151 Clark Trafficway', 'West David', '28801'), (6, 'Thomas Garcia', '0164 Jerry Camp', 'Lake Nathanberg', '36246'), (7, 'Frances Bolton', '312 Fox Camp Suite 051', 'Ashleybury', '58179'), (8, 'Diana Benitez', '342 Cynthia Springs', 'Port Anna', '44790'), (9, 'Jonathon Roberts', '22046 Cheryl Rest', 'Jameshaven', '71542'), (10, 'Hailey Hood', '47502 Robinson Junction', 'West Angel', '17768'), (11, 'Glen Schaefer', '1840 Dillon Coves Suite 538', 'New Jennifer', '18020'), (12, 'James Lee', '05448 Vazquez Loop', 'North Raymondmouth', '32981'), (13, 'Dr. Holly Ferguson DVM', '28892 Lisa Court Apt. 990', 'West Beverly', '12830'), (14, 'Juan Dudley', '20335 Reed Branch Suite 570', 'Lake Elijah', '04086'), (15, 'Amy Gonzalez', '2694 Shaw Streets'

In [10]:
import psycopg2 as db
from faker import Faker

fake=Faker()
data=[]
i=2

for r in range(1000):
    data.append((i,fake.name(),fake.street_address(), fake.city(),fake.zipcode()))
    i+=1

data_for_db=tuple(data)

# print(data_for_db)

conn_string="dbname='dataengineering' host='localhost' user='postgres' password='admin'"
conn=db.connect(conn_string)
cur=conn.cursor()

query = "insert into users (id,name,street,city,zip) values(%s,%s,%s,%s,%s)"
# print(cur.mogrify(query,data_for_db[1]))

cur.executemany(query,data_for_db)
conn.commit()
query2 = "select * from users"

cur.execute(query2)
print(cur.fetchall())

[('Big Bird', 1, 'Sesame Street', 'Fakeville', '12345'), ('Big Bird', 1, 'Sesame Street', 'Fakeville', '12345'), ('Patricia Banks', 2, '0376 Victoria Track Apt. 763', 'Kimberlyville', '55976'), ('Tom Henry', 3, '5749 Jeffery Shoals', 'South Rebecca', '65766'), ('Sabrina Morris', 4, '3750 Arias Lake', 'East Tara', '47330'), ('Timothy Torres', 5, '926 Rogers Mount Suite 596', 'Donnaville', '84758'), ('Julia Summers', 6, '09367 Henry Lane Suite 585', 'North Eric', '63237'), ('Tracy Nixon', 7, '66842 Campbell Drive', 'South Pamelaland', '02829'), ('Neil Lewis', 8, '98437 Miller Isle', 'Joseside', '20862'), ('Kevin Fox', 9, '901 Martin Oval', 'Janiceton', '79701'), ('Travis Rodriguez', 10, '608 Richard Brook Suite 329', 'North Edward', '88770'), ('Regina Sherman', 11, '4101 Riddle Haven Apt. 568', 'Lake Coryfort', '08480'), ('Rodney Perez', 12, '09430 Payne Road Apt. 621', 'Port Jennifer', '63204'), ('Michelle Lopez', 13, '69153 Jessica Locks', 'Lopezhaven', '87036'), ('Janet Griffith', 14,

### Extracting data from PostgreSQL

In [12]:
import psycopg2 as db

conn_string="dbname='dataengineering' host='localhost' user='postgres' password='admin'"
conn=db.connect(conn_string)
cur=conn.cursor()

query = "select * from users"
cur.execute(query)

print(cur.fetchone())
print(cur.rowcount)
print(cur.rownumber)
print(cur.fetchmany(3))
print(cur.rownumber)

f=open('fromdb.csv','w')
conn=db.connect(conn_string)
cur=conn.cursor()
cur.copy_to(f,'users',sep=',')
f.close()

f=open('fromdb.csv','r')
print(f.read())

('Big Bird', 1, 'Sesame Street', 'Fakeville', '12345')
4002
1
[('Big Bird', 1, 'Sesame Street', 'Fakeville', '12345'), ('Patricia Banks', 2, '0376 Victoria Track Apt. 763', 'Kimberlyville', '55976'), ('Tom Henry', 3, '5749 Jeffery Shoals', 'South Rebecca', '65766')]
4
Big Bird,1,Sesame Street,Fakeville,12345
Big Bird,1,Sesame Street,Fakeville,12345
Patricia Banks,2,0376 Victoria Track Apt. 763,Kimberlyville,55976
Tom Henry,3,5749 Jeffery Shoals,South Rebecca,65766
Sabrina Morris,4,3750 Arias Lake,East Tara,47330
Timothy Torres,5,926 Rogers Mount Suite 596,Donnaville,84758
Julia Summers,6,09367 Henry Lane Suite 585,North Eric,63237
Tracy Nixon,7,66842 Campbell Drive,South Pamelaland,02829
Neil Lewis,8,98437 Miller Isle,Joseside,20862
Kevin Fox,9,901 Martin Oval,Janiceton,79701
Travis Rodriguez,10,608 Richard Brook Suite 329,North Edward,88770
Regina Sherman,11,4101 Riddle Haven Apt. 568,Lake Coryfort,08480
Rodney Perez,12,09430 Payne Road Apt. 621,Port Jennifer,63204
Michelle Lopez,13,6

### Extracting data with DataFrames

In [14]:
import psycopg2 as db
import pandas as pd

conn_string="dbname='dataengineering' host='localhost' user='postgres' password='admin'"
conn=db.connect(conn_string)

df=pd.read_sql("select * from users", conn)

df.head(10)
# print(df.head())
# print(df['city'].value_counts())

             name  id                        street           city    zip
0        Big Bird   1                 Sesame Street      Fakeville  12345
1        Big Bird   1                 Sesame Street      Fakeville  12345
2  Patricia Banks   2  0376 Victoria Track Apt. 763  Kimberlyville  55976
3       Tom Henry   3           5749 Jeffery Shoals  South Rebecca  65766
4  Sabrina Morris   4               3750 Arias Lake      East Tara  47330
Port Jennifer          8
North Michael          8
West Robert            6
East John              5
New William            5
West Michael           5
Smithmouth             4
East James             4
South Michael          4
New Christopher        4
Lake James             4
Lake David             4
Port Christopher       4
North David            4
Port James             4
South Michelle         4
Lake Kelly             4
Johnhaven              4
Lake Stephanie         4
East Robert            4
New James              4
Port Sarah             4
Joseph

In [15]:
df.head(10)

Unnamed: 0,name,id,street,city,zip
0,Big Bird,1,Sesame Street,Fakeville,12345
1,Big Bird,1,Sesame Street,Fakeville,12345
2,Patricia Banks,2,0376 Victoria Track Apt. 763,Kimberlyville,55976
3,Tom Henry,3,5749 Jeffery Shoals,South Rebecca,65766
4,Sabrina Morris,4,3750 Arias Lake,East Tara,47330
5,Timothy Torres,5,926 Rogers Mount Suite 596,Donnaville,84758
6,Julia Summers,6,09367 Henry Lane Suite 585,North Eric,63237
7,Tracy Nixon,7,66842 Campbell Drive,South Pamelaland,2829
8,Neil Lewis,8,98437 Miller Isle,Joseside,20862
9,Kevin Fox,9,901 Martin Oval,Janiceton,79701
