# Demos 1 : Reading and Writing files in Python

### working with data need libraries

install `Faker` package

`conda install -c conda-forge faker`

or

`pip install Faker`

### Writing and reading CSVs

Writing CSVs using the Python CSV Library. Create 1,000 records data and save to CSV format

In [24]:
from faker import Faker

In [25]:
from faker import Faker
import csv

output=open('data-generator-faker.csv','w')

fake=Faker()
header=['name','age','street','city','state','zip','lng','lat']
mywriter=csv.writer(output)
mywriter.writerow(header)

for r in range(1000):
    mywriter.writerow([fake.name(),
                       fake.random_int(min=18, max=80, step=1), 
                       fake.street_address(), 
                       fake.city(),
                       fake.state(),
                       fake.zipcode(),
                       fake.longitude(),
                       fake.latitude()])
output.close()

Reading CSVs by python

In [26]:
import csv

with open('data-generator-faker.csv') as f:
    myreader=csv.DictReader(f)
    headers=next(myreader)
    for row in myreader:
        print(row['name'])

Bethany Martinez
Sarah Perkins
Matthew Brown
Evelyn Thomas
Derrick Montgomery
Tracy Roth
Samantha Henderson
William Burgess
Alexis Leach
Julia Watkins
Clarence Jones
Dr. Jerry Hanson
Kristi Eaton
Trevor Johnson
William Nicholson
Tara Harper
Edwin Thompson
Tracy Chapman
Tyler Dillon
Melissa Smith
Samantha Phillips
Jennifer Peterson
Tricia Robbins
Dustin Perez
Helen Floyd
Kathryn Pittman
Kyle Best
Jason Morgan
Christopher Velasquez
Maurice Hensley
Cody Barrett
Harold Swanson
Ms. Cassidy Kline
Drew Shaffer
Andrew Brock
Brandon Rowe
Michael Becker
William Kemp Jr.
Jeremy Clark
Kristen King
Amy Fletcher
Michael Stein
Justin Torres
Jeffery Richard
Christopher Fitzgerald
Ana Ortiz
Matthew Massey
Dana Snyder
Michael Cunningham
Richard Robbins
Adrian Kane
Sheena Bell
Erica Freeman
Phillip Lee
Kelly Leonard
Taylor Coleman
Peter Richards
Christian Owens
David Stone
Alan Gill
Danny Cruz
Darrell Anderson
Nicole Walls
Michael Gonzales
Kyle Herman
Rachel Reynolds
Melanie Sanchez
Julie Reynolds
Brian 

### Reading and writing CSVs using pandas DataFrames

install `pandas` package

`pip install pandas`

In [27]:
import pandas as pd
df = pd.read_csv('data-generator-faker.csv')

In [28]:
df.head(10)

Unnamed: 0,name,age,street,city,state,zip,lng,lat
0,Wendy Peters,70,668 Robert Junctions Suite 919,Williamsmouth,Georgia,45227,-159.668739,31.029393
1,Bethany Martinez,73,13242 Cox Meadow,Lake Dylan,Indiana,42157,21.483181,-55.366385
2,Sarah Perkins,20,4220 Matthew Island,West Michaeltown,Utah,79262,95.539944,6.915683
3,Matthew Brown,18,6628 Flores Dam Apt. 186,New Ryan,South Dakota,46123,-125.118068,6.593683
4,Evelyn Thomas,39,11873 Adam Expressway,East Samuelland,Mississippi,80578,46.444284,-76.839136
5,Derrick Montgomery,69,9988 Sanford Walks,Bettyhaven,Wisconsin,69524,-133.131484,-52.598728
6,Tracy Roth,60,41241 Jimmy Cape,Robertberg,Utah,96292,-14.315962,88.745744
7,Samantha Henderson,50,942 Cathy Mountains,Davidview,Hawaii,38550,-53.322235,-75.068396
8,William Burgess,43,968 Antonio Fort,Colestad,South Carolina,14858,164.142611,-84.99173
9,Alexis Leach,39,31066 Miller Tunnel Suite 310,Samanthachester,Alabama,41174,72.043399,72.305274


### Writing JSON with Python

Write JSON using Python and the standard library

In [29]:
from faker import Faker
import json

output=open('data-generator-faker.json','w')

fake=Faker()
alldata={}
alldata['records']=[]

for x in range(1000):
    data={"name":fake.name(),
          "age":fake.random_int(min=18, max=80, step=1),
          "street":fake.street_address(),
          "city":fake.city(),
          "state":fake.state(),
          "zip":fake.zipcode(),
          "lng":float(fake.longitude()),
          "lat":float(fake.latitude()),
          "code":fake.phone_number()}
    alldata['records'].append(data)
json.dump(alldata,output)

Reading JSON

In [30]:
import json
with open("data-generator-faker.json",'r') as f:
    data=json.load(f)
print(type(data))
print(data['records'][0]['name'])

JSONDecodeError: Expecting value: line 1 column 188367 (char 188366)

In [31]:
import pandas as pd
df = pd.read_json('data-generator-faker.json')

ValueError: Expected object or value

## Working with Databases

### Inserting and extracting relational data in Python

1. Creating a PostgreSQL database and tables in pgadmin4 that consist of 

    a) name: text

    b) id: integer 

    c) street: text 

    d) city: text

    e) zip: text
    
    
2. Inserting data into PostgreSQL

    Installing psycopg2
    
    Connecting to PostgreSQL with Python
    
    Inserting data
    
    Inserting multiple records
    


In [1]:
import psycopg2 as db

In [8]:
conn_string="dbname='aj.kholid' host='localhost' user='postgres' password='admin'"

conneting to PostgreSQL

In [9]:
import psycopg2 as db

conn_string="dbname='aj.kholid' host='localhost' user='postgres' password='admin'"

conn=db.connect(conn_string)

cur=conn.cursor()


OperationalError: could not connect to server: Connection refused (0x0000274D/10061)
	Is the server running on host "localhost" (::1) and accepting
	TCP/IP connections on port 5432?
could not connect to server: Connection refused (0x0000274D/10061)
	Is the server running on host "localhost" (127.0.0.1) and accepting
	TCP/IP connections on port 5432?


Inserting Data

In [None]:
query = "insert into users (id,name,street,city,zip) values({},'{}','{}','{}','{}')".format(1,'Big Bird','Sesame Street','Fakeville','12345')
print(cur.mogrify(query))

query2 = "insert into users (id,name,street,city,zip) values(%s,%s,%s,%s,%s)"
data=(1,'Big Bird','Sesame Street','Fakeville','12345')
print(cur.mogrify(query2,data))

cur.execute(query2,data)
conn.commit()

Inserting multiple records

In [None]:
import psycopg2 as db
from faker import Faker

fake=Faker()
data=[]
i=2

for r in range(1000):
    data.append((i,fake.name(),fake.street_address(), fake.city(),fake.zipcode()))
    i+=1

data_for_db=tuple(data)

print(data_for_db)

In [None]:
import psycopg2 as db
from faker import Faker

fake=Faker()
data=[]
i=2

for r in range(1000):
    data.append((i,fake.name(),fake.street_address(), fake.city(),fake.zipcode()))
    i+=1

data_for_db=tuple(data)

# print(data_for_db)

conn_string="dbname='dataengineering' host='localhost' user='postgres' password='admin'"
conn=db.connect(conn_string)
cur=conn.cursor()

query = "insert into users (id,name,street,city,zip) values(%s,%s,%s,%s,%s)"
# print(cur.mogrify(query,data_for_db[1]))

cur.executemany(query,data_for_db)
conn.commit()
query2 = "select * from users"

cur.execute(query2)
print(cur.fetchall())

### Extracting data from PostgreSQL

In [None]:
import psycopg2 as db

conn_string="dbname='dataengineering' host='localhost' user='postgres' password='admin'"
conn=db.connect(conn_string)
cur=conn.cursor()

query = "select * from users"
cur.execute(query)

print(cur.fetchone())
print(cur.rowcount)
print(cur.rownumber)
print(cur.fetchmany(3))
print(cur.rownumber)

f=open('fromdb.csv','w')
conn=db.connect(conn_string)
cur=conn.cursor()
cur.copy_to(f,'users',sep=',')
f.close()

f=open('fromdb.csv','r')
print(f.read())

### Extracting data with DataFrames

In [None]:
import psycopg2 as db
import pandas as pd

conn_string="dbname='dataengineering' host='localhost' user='postgres' password='admin'"
conn=db.connect(conn_string)

df=pd.read_sql("select * from users", conn)

df.head(10)
# print(df.head())
# print(df['city'].value_counts())

In [None]:
df.head(10)