# Introduction to Python  

## Data Persistence with Python

+ #### _file_
+ #### _pickle_
+ #### _dill_
+ #### _json_

In [117]:
import os
import pickle
import dill 

## [_file/open_](https://docs.python.org/3/tutorial/inputoutput.html#reading-and-writing-files)

### open() returns a file object, and is most commonly used with two arguments: open(filename, mode).  

<table>
        <tbody>
            <tr>
                <th>Mode</th>
                <th>Description</th>
            </tr>
            <tr>
                <td><code>'r'</code></td>
                <td>Open a file for reading. (default)</td>
            </tr>
            <tr>
                <td><code>'w'</code></td>
                <td>Open a file for writing. Creates a new file if it does not exist or truncates the file if it exists.</td>
            </tr>
            <tr>
                <td><code>'x'</code></td>
                <td>Open a file for exclusive creation. If the file already exists, the operation fails.</td>
            </tr>
            <tr>
                <td><code>'a'</code></td>
                <td>Open for appending at the end of the file without truncating it. Creates a new file if it does not exist.</td>
            </tr>
            <tr>
                <td><code>'t'</code></td>
                <td>Open in text mode. (default)</td>
            </tr>
            <tr>
                <td><code>'b'</code></td>
                <td>Open in binary mode.</td>
            </tr>
            <tr>
                <td><code>'+'</code></td>
                <td>Open a file for updating (reading and writing)</td>
            </tr>
        </tbody>
    </table>

In [118]:
text = 'My string'
f = open('my_file.txt', mode='w', encoding='utf-8')

In [119]:
print(f)

<_io.TextIOWrapper name='my_file.txt' mode='w' encoding='utf-8'>


In [120]:
dir(f)

['_CHUNK_SIZE',
 '__class__',
 '__del__',
 '__delattr__',
 '__dict__',
 '__dir__',
 '__doc__',
 '__enter__',
 '__eq__',
 '__exit__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__lt__',
 '__ne__',
 '__new__',
 '__next__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '_checkClosed',
 '_checkReadable',
 '_checkSeekable',
 '_checkWritable',
 '_finalizing',
 'buffer',
 'close',
 'closed',
 'detach',
 'encoding',
 'errors',
 'fileno',
 'flush',
 'isatty',
 'line_buffering',
 'mode',
 'name',
 'newlines',
 'read',
 'readable',
 'readline',
 'readlines',
 'reconfigure',
 'seek',
 'seekable',
 'tell',
 'truncate',
 'writable',
 'write',
 'write_through',
 'writelines']

In [121]:
print(f.writable())
print(f.closed)

True
False


In [122]:
f.write('Hello\n')

6

In [123]:
f.write('How are you?\n')
f.write('one more\n')
f.write('Ok, bye!\n')
f.write('empty\n')
f.write('A small text\tafter a <tab>')

26

After creating the file and wrtiting in it, we should close the file in order for it be accessible:

In [124]:
f.closed

False

In [125]:
f.close()

In [126]:
f.closed

True

In [127]:
#f.write('Trying to write again!\n')   #error!

In [128]:
g = open('my_file.txt', mode='r', encoding='utf-8')

In [129]:
print(g)

<_io.TextIOWrapper name='my_file.txt' mode='r' encoding='utf-8'>


In [130]:
print(g.writable())

False


In [131]:
#g.write('something')   #error!

In [132]:
g.close()

### Reading the content of a file:

+ #### read()
+ #### readline()
+ #### readlines() 

### Reading all file at once: read()

In [133]:
g = open('my_file.txt', mode='r', encoding='utf-8')

all_text = g.read()
print(all_text)

Hello
How are you?
one more
Ok, bye!
empty
A small text	after a <tab>


In [134]:
new_attempt = g.read()
print(new_attempt)  #the file will seen to be empty, because the read leads the pointer to the last position




In [135]:
g.seek(0) #moving to the first position again

0

In [136]:
new_attempt = g.read()
print(new_attempt)

Hello
How are you?
one more
Ok, bye!
empty
A small text	after a <tab>


### Reading all lines in a list: readlines()

In [137]:
g.seek(0)
list_of_lines = g.readlines()
print(list_of_lines)

['Hello\n', 'How are you?\n', 'one more\n', 'Ok, bye!\n', 'empty\n', 'A small text\tafter a <tab>']


### Reading one line at a time: readline()

In [138]:
g.seek(0)
first_line = g.readline()
print(first_line)

Hello



In [139]:
second_line = g.readline()
print(second_line)

How are you?



In [140]:
third_line = g.readline()
print(third_line)
fourth_line = g.readline()
print(fourth_line)

one more

Ok, bye!



In [141]:
g.close()

In [142]:
h = open('my_file.txt', mode='a', encoding='utf-8')

In [143]:
h.write('\nAdded a line\nand yet another one\n')

34

In [144]:
h.seek(0)

0

In [145]:
#h.read() #error!

In [146]:
h.close()

### The pythonic way to deal with files:

In [147]:
with open('new_file.txt', mode='w', encoding='utf-8') as f:
    f.write('Weight\t\t72\n')
    f.write('Height\t\t183\n')
    f.write('Age\t\t44\n')
    f.write('Gender\t\tMasculine\n')
    f.write('\n')

In [148]:
f.closed

True

In [149]:
with open('new_file.txt', mode='r', encoding='utf-8') as f:
    list_of_lines = f.readlines()
    
print(list_of_lines)

['Weight\t\t72\n', 'Height\t\t183\n', 'Age\t\t44\n', 'Gender\t\tMasculine\n', '\n']


### Deleting files

In [150]:
os.remove("./new_file.txt")
os.remove("./my_file.txt")

## [_pickle_](https://docs.python.org/3/library/pickle.html)

#### The pickle module implements binary protocols for serializing and de-serializing a Python object structure. “Pickling” is the process whereby a Python object hierarchy is converted into a byte stream, and “unpickling” is the inverse operation, whereby a byte stream (from a binary file or bytes-like object) is converted back into an object hierarchy. Pickling (and unpickling) is alternatively known as “serialization”  

In [151]:
example_dict = {1:"6",2:"2",3:"f"}
example_dict

{1: '6', 2: '2', 3: 'f'}

### Saving

In [152]:
filehandler = open("./dict.pickle","wb")
print(type(filehandler))
pickle.dump(example_dict, filehandler)
filehandler.close()

<class '_io.BufferedWriter'>


#### Doing the Pythonic way

In [153]:
with open("./dict.pickle","wb") as f:
    pickle.dump(example_dict, f)

In [154]:
del(example_dict) #deleting variable in the environment

In [155]:
#example_dict    #error

### Retrieving

In [156]:
with open("./dict.pickle","rb") as f:
    example_dict = pickle.load(f)

In [157]:
example_dict

{1: '6', 2: '2', 3: 'f'}

### Multiple objects:

In [158]:
a = 12
b = ['one', 'list']
c = ('one','tuple')
d = {1,2,4}

print(type(a))
print(type(b))
print(type(c))
print(type(d))

<class 'int'>
<class 'list'>
<class 'tuple'>
<class 'set'>


In [159]:
with open('my_objects.pkl', 'wb') as f:
    pickle.dump((a,b,c,d), f)

In [160]:
del a
del b
del c
del d

In [161]:
#print(type(a))
#print(type(b))
#print(type(c))
#print(type(d)) #error!

In [162]:
with open('my_objects.pkl', 'rb') as f:
    a,b,c,d = pickle.load(f)

In [163]:
print(type(a))
print(type(b))
print(type(c))
print(type(d))

<class 'int'>
<class 'list'>
<class 'tuple'>
<class 'set'>


### To serialise functions or classes to files, use the module [dill](https://medium.com/@emlynoregan/serialising-all-the-functions-in-python-cd880a63b591)  
[Docs](https://dill.readthedocs.io/en/latest/dill.html)

In [164]:
def summing(x,y):
    return x + y 

In [165]:
print(type(summing))

<class 'function'>


In [166]:
with open("my_function.dill", "wb") as f:
    dill.dump(summing, f)

In [167]:
del summing

In [168]:
#summing(3,4)    #error

In [169]:
with open("my_function.dill", "rb") as f:
    summing = dill.load(f)

In [170]:
summing(3,4)

7

In [171]:
class my_integer(int):
    def __init__(self, x):
        self.x = x
        
    def __add__(self,y):
        return self.x - y

In [172]:
x = my_integer(10)
print(x)

10


In [173]:
with open('my_class.dill', 'wb') as f:
    dill.dump(my_integer, f)

In [174]:
del my_integer

In [176]:
#x = my_integer(10)    #error

In [177]:
with open('my_class.dill', 'rb') as f:
    my_integer = dill.load(f)

In [178]:
x = my_integer(10)
print(x)

10


### Deleting files

In [179]:
os.remove("./dict.pickle")
os.remove("./my_objects.pkl")
os.remove("./my_function.dill")
os.remove("./my_class.dill")

## [_json_](https://docs.python.org/3/library/json.html)

### JSON can store Lists, bools, numbers, tuples and dictionaries. But to be saved into a file, all these structures must be reduced to strings. It is the string version that can be read or written to a file. Python has a JSON module that will help converting the datastructures to JSON strings.  

+ [JSON](https://www.w3schools.com/whatis/whatis_json.asp) stands for JavaScript Object Notation
+ JSON is a lightweight format for storing and transporting data
+ JSON is often used when data is sent from a server to a web page
+ JSON is "self-describing" and easy to understand

### JSON Syntax Rules

+ Data is in name/value pairs
+ Data is separated by commas
+ Curly braces hold objects
+ Square brackets hold arrays

In [180]:
import json 

In [181]:
# some JSON:
x =  '{ "name":"John", "age":30, "city":"New York"}'

# parse x:
y = json.loads(x)

# the result is a Python dictionary:
print(y["age"]) 

30


In [182]:
type(y)

dict

In [183]:
# a Python object (dict):
x = {"name": "John", "age": 30, "city": "New York"}

# convert into JSON:
y = json.dumps(x)

# the result is a JSON string:
print(y)

{"name": "John", "age": 30, "city": "New York"}


In [184]:
type(y)

str

Convert Python objects into JSON strings, and print the values:

In [185]:
print(json.dumps({"name": "John", "age": 30}))
print(json.dumps(["apple", "bananas"]))
print(json.dumps(("apple", "bananas")))
print(json.dumps("hello"))
print(json.dumps(42))
print(json.dumps(31.76))
print(json.dumps(True))
print(json.dumps(False))
print(json.dumps(None)) 

{"name": "John", "age": 30}
["apple", "bananas"]
["apple", "bananas"]
"hello"
42
31.76
true
false
null


In [186]:
x = {
  "name": "John",
  "age": 30,
  "married": True,
  "divorced": False,
  "children": ("Ann","Billy"),
  "pets": None,
  "cars": [
    {"model": "BMW 230", "mpg": 27.5},
    {"model": "Ford Edge", "mpg": 24.1}
  ]
}

print(json.dumps(x))

{"name": "John", "age": 30, "married": true, "divorced": false, "children": ["Ann", "Billy"], "pets": null, "cars": [{"model": "BMW 230", "mpg": 27.5}, {"model": "Ford Edge", "mpg": 24.1}]}


### Practical example: Webscraping and saving data:

In [189]:
!pip install -U -q requests beautifulsoup4

In [190]:
import requests
import string
from bs4 import BeautifulSoup
from collections import Counter

In [191]:
page = requests.get('https://en.wikipedia.org/wiki/FIFA_World_Cup')
soup = BeautifulSoup(page.text, "lxml")
text = soup.text
words = text.split()

SSLError: HTTPSConnectionPool(host='en.wikipedia.org', port=443): Max retries exceeded with url: /wiki/FIFA_World_Cup (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:997)')))

In [None]:
upper = [m for m in words if m.istitle()]
upper_clean = [m.strip(string.punctuation) for m in upper]
upper_clean = [m.strip(string.digits) for m in upper_clean]
upper_clean = [m.strip(string.punctuation) for m in upper_clean]
upper_clean = [m for m in upper_clean if len(m)>1]

In [None]:
frequencies = Counter(upper_clean)

In [None]:
frequencies.most_common(5)

[('World', 295), ('Cup', 261), ('The', 103), ('Retrieved', 91), ('Brazil', 64)]

In [None]:
with open('Fifa_stats.txt','w') as f:
    for key, value in frequencies.items():
        if value > 4:
            f.write(f'The word {key} appears {value} times\n')

In [None]:
with open('Fifa_stats.txt','r') as f:
    text = f.read()

In [None]:
print(text[0:1028])

The word World appears 295 times
The word Cup appears 261 times
The word Wikipedia appears 5 times
The word November appears 19 times
The word July appears 28 times
The word British appears 7 times
The word English appears 5 times
The word June appears 25 times
The word From appears 7 times
The word Association appears 38 times
The word This appears 11 times
The word For appears 6 times
The word France appears 38 times
The word Brazil appears 64 times
The word The appears 103 times
The word Fédération appears 31 times
The word Internationale appears 31 times
The word Football appears 58 times
The word War appears 7 times
The word Russia appears 14 times
The word In appears 22 times
The word Germany appears 63 times
The word Italy appears 36 times
The word Argentina appears 29 times
The word Uruguay appears 34 times
The word England appears 23 times
The word Spain appears 21 times
The word Olympic appears 15 times
The word Games appears 20 times
The word Mexico appears 26 times
The word