# Read and Write files
At this point, you all proved (hopefully) in the homework that you know how to access a Wikipedia API. Moreover, using the code from the last class or today's exercise you should be able to webscrap a simple webpage like WikiLeaks or Google Scholar search. Now, the question arises what to do next with the data you acquired? You have all the data stored in the variable in the Notebook, but probably you would like to write it out to your local machine. Once we are done with saving the data we will show you how to load the data from your local machine.

## Write the data
Let's start with a really simple example. Imagine, that for some strange reasons we assigned the whole article from WikiLeaks text on fishrot in a variable text. The code below does just that.

In [None]:
## import packages
import requests
from bs4 import BeautifulSoup 

## get the response
response = requests.get('https://wikileaks.org/fishrot/')

In [None]:
## convert to Beautiful Soup object
html = BeautifulSoup(response.content, 'html.parser')

## create a list to store all the paragraphs
text_list = []

## populate the list with content of paragraphs
for p in html.select('div.leak-content'):
    ## extract text and remove unnecessary white spaces
    text_list.append(p.text.strip())

## create a long string with the content of paragraphs. Each paragraph will start from the new line sign.
text = "\n".join(text_list)

In [None]:
print(text)

So having the above code in the Notebook is a cool thing. Most likely it will work next time we want to use it. However, it would be a bit pointless executing the whole code every time we want to analyze the text about Fishrot. So obviously one way of doing it would be printing it first and then copying and pasting it to the desired file. Although it is not the most effective way of doing it, let's think for a moment what this process would look like. What does your friend do when they want to write to a file a text from the Internet? They usually start the following way:
1. Copy the text
2. Open a file
3. Paste the text
4. Save the file

So in Python, you do a very similar thing. Let's say we want to save this `text` variable:

In [None]:
## We start with opening the file in write mode
file = open('fishrot.txt', mode='w')

## Then we have to put there something. So we use method write on the file.
file.write(text)

## And at last we close the file
file.close()

Actually, the last part is important although in Colab it might be a bit different (I mean it might not raise an error). In normal Python, if you do not close the file it will explode. Not really, but you might end up either with an error when you want to load the file or not really inserting there anything. However, there is a much smarter and more popular way of writing something out to a file.

In [None]:
with open('fishrot.txt', 'w') as file:
  file.write(text)

What happened above? Using `with` statement we told Python to open a file called `fishrot.txt` in write mode and assign it to a temporary variable `file`. Then we wrote the variable text to `fishrot.txt` and closed the file. So the good thing about using `with` statement is that when you exit the indent the file is safely closed.

Ok, so far so good. But our problem is still not done yet cause we moved our data from the notebook to our workspace. And we still do not have it in the place we can access it easily. If we want to download it to our local machine we have two fairly easy options. We can either just click on the file and press download or download it from Python. To do it from Python we just load an additional package and use a simple function called download.

In [None]:
## We need to load package files to do it in Python
from google.colab import files
## And then just use function download
files.download('fishrot.txt') 

Ok, but what happens if we want to add something to our file? Would it be enough to just download another article and save it in the same file? Actually, there are two answers to that question. Yes and No. If we write something to our file before closing it then yes we would append it to the end of the file. However, if we close the file and open it again then we would overwrite the information which was there. So are we doomed and we can't append anything to the files? Obviously not. Function `open` allows us to open a file, put the cursor at the end of the file and write something there. We just need to use mode `a` instead of `w`.

**Exercise 1.** Download two articles from WikiLeaks and save the content into one file called `wikileaks.txt`.

In [None]:
## Exercise 1.

## import packages
import requests
from bs4 import BeautifulSoup 

## links from which you need to download the content
links = ['https://wikileaks.org/popeorders/',
         'https://wikileaks.org/fishrot/']


## Read a file
Now everyone should have the file called `wikileaks.txt` saved safely in their workspace. But how to load it now to Python? Yes, you guess well it is a similar process to writing it out. First, we need to open the file and later read its output. We are going to use again use function `open` but this time in read mode - `r`.

In [None]:
with open('wikileaks.txt', 'r') as file:
  text = file.read()

In [None]:
print(text)

Yeah, this is as simple as this. However, before we move to write more interesting data then just simple strings let's first see whether we can read the text not as a string but as a list. So you remember that we added the end of the line sign (`\n`) after every single paragraph, right? Let's try to read the file line after line instead of just reading it as a long string.

In [None]:
with open('wikileaks.txt', 'r') as file:
  lines = file.readlines()

lines

This is again very good but what about uploading a file from your local machine. Again, you can either use a graphical interface or do it in Python. Therefore, you can either press Upload or just type the following code.

In [None]:
## We need to load package files to do it in Python
from google.colab import files
## And then just use function download
files.upload()

## JSON

Let's now move to something more interesting it means writing and loading a `json` format files. You must remember that `json` looks much alike `mapping` vel `dicitonary`, right? So let's first create a dictionary so we have some real data. To make it as simple as possible we will scrap the data again from WikiLeaks but this time we will assign the title and the content of the article to json.

In [None]:
## import packages
import requests
from bs4 import BeautifulSoup 

## assign url to url variable
url = 'https://wikileaks.org/dealmaker/Al-Yousef/'

## get the response
response = requests.get(url)

## convert to Beautiful Soup object
html = BeautifulSoup(response.content, 'html.parser')

## create an empty mapping
article = {}

## create a pair key-item with title
article['title'] = html.select_one('div.release h1').text

## extract the content
text_list = []
for p in html.select('div.leak-content'):
    text.append(p.text.strip())

## convert the list to a string
text = "\n".join(text)
    
## create a pair key-item with content
article['content'] = text

article

Actually, before we move any further. Let's do a short exercise.

**Exercise 2.** You have a mapping article that contains the title of the article and its content. Please add another field with the data.

In [None]:
## Exercise 2



Ok, so how to write it out to a `json` file? It is not that hard. It follows a similar pattern as writing out a text file but there is a small trick you would like to learn for sure. However, let's first start with the simplest possible way. It means writing the `article` mapping to a `json` file. We just need a package `json` and function `dumps` from this package. This function will convert a dictionary to a string.

In [None]:
## import json package
import json

## write the dictionary to the file
with open('article.json', 'w') as file:
    file.write(json.dumps(article))

Loading a `json` is also quite simple. You just need to first read it, and then use function `loads` to convert a `string` to a `dict` type.

In [None]:
## import json package
import json

## read the dictionary from the file
with open('article.json', 'r') as file:
    ## read the string
    article_str = file.read()
    ## convert string to dictionary and asssign it to article_dict
    article_dict = json.loads(article_str)

This is all very nice but let's imagine that we have a list of dictionaries and we want to save it first to a `json line` (`jl`) file and then read it from there. First, let's see what we mean by that. Let's, first create an artificial list of dictionaries in which every single mapping will have the same fields.

In [None]:
## create a mapping
artificial_list = [
    {"name":"Alice",
     "age":17,
     "interests":[
        {"name":"physics",
         "fields":["quantumphysics","stringtheory"]
        },
        {"name":"sport",
         "fields":["fishing","football"]
        }]
    },
    {"name":"Bob",
     "age":15,
     "interests":[
        {"name":"sport",
         "fields":["football"]
        }]
    }
]

So what we will do is just write every single dictionary in separate lines in our file. So in the file, we will recreate our list. It is a very useful approach especially when you read or save longer files with many lines. 

In [None]:
## import json package
import json

## write the the list of dictionaries to a json line file
with open('bob_and_alice.jl', 'w') as file:
    ## iterate over elelments of the list
    for line in artificial_list:
      ## write every dictionary to a line
      file.write(json.dumps(line)+'\n')  
    

Reading this kind of file is a bit more complicated but not that hard. Again, we will need the `json` package. This time, however, we will not use the `read` function but `readlines` because we want to create a list. Every line read as a string, we will transform into a mapping.

In [None]:
## import json package
import json

## create a list
bob_and_alice = []

## read every line of the file and append it to the list
with open('bob_and_alice.jl', 'r') as file:
    lines = file.readlines()
    for line in lines:
        bob_and_alice.append(json.loads(line))