<div style="color:white;
           display:fill;
           border-radius:5px;
           background-color:#5642C5;
           font-size:200%;
           font-family:Arial;letter-spacing:0.5px">

<p width = 20%, style="padding: 10px;
              color:white;">
Base Python: File Input and Output
              
</p>
</div>

Data Science Cohort Live NYC Feb 2022
<p>Phase 1: Topic 3</p>
<br>
<br>

<div align = "right">
<img src="Images/flatiron-school-logo.png" align = "right" width="200"/>
</div>
    
    

#### File input/output in Python
* Read data from files into Python
* Write data from Python into files
* Key file types and how to interact with them in Python

#### "In Memory" 

- "In Memory" = in RAM 
    - data loaded here for fast manipulation: when data in use.
    - program (Python) manipulates/accesses data in this memory
    - data off-loaded from RAM after Python kernel shuts down.
    - limited (a few gigs max)

   

Create a variable x:

In [None]:
x = [5, 6, 7]

- `x` now exists in memory (RAM).

In [None]:
hex(id(x))

Then let's say we modify the value of this variable:

In [None]:
x[0] = 10
x

If we check the address, it hasn't changed:

In [None]:
hex(id(x))

Quickly accessing/changing elements in memory alotted to `x`.

Restart the kernel.

In [None]:
x

`x` is gone from memory.

This is thus "in-use" memory: dissipates when the software using it stops.

#### "On disk" = hard drive (or remote server):
- long term storage
- terabytes to petabytes
- too slow for direct manipulation



#### File I/O: General

1. Access file on disk, load in memory.
2. Data in memory, save to disk.

#### 1. Locating files: directory and path with os library

In [None]:
import os

Get the current working directory
- Example of an absolute path string:

In [None]:
os.getcwd()

List files and folders in current directory

In [None]:
os.listdir()

Change Directory (relative path)

In [None]:
os.chdir('Data/')

In [None]:
os.getcwd()

In [None]:
os.listdir()

Going back up a directory level

In [None]:
os.chdir('../')

In [None]:
os.getcwd()

In [None]:
os.listdir()

#### OS path checking
- Is file there?

In [None]:
os.path.exists('C:\\Users\\prave\\Flatiron_Lectures\\Phase1_Topic3_Data_BasePython\\Data\\zen_of_python.txt')

In [None]:
os.getcwd()

In [None]:
os.chdir('../')

In [None]:
os.path.exists('Data/zen_of_python.txt')

Let's create a path string for the relative path for the Zen of Python

In [None]:
file_path = "Data/zen_of_python.txt"

#### 2. Opening a File in Python

- Establishes communication between Python kernel and file on disk.
- `open` built-in function.
- File can be opened in different modes: read, write, read/write
    - read (mode = 'r')

Opening a file for read:

In [None]:
file_obj = open(file_path, 'r')

Ok, we opened the file! What do we have now?

In [None]:
type(file_obj)

This is a text file object: communication enabled for reading operations.

#### 3. Reading Contents of the File

- use various methods of object to actually read file contents.
    - file_obj.readline(): reads file line by line tracking position.
    - file_obj.readlines(): returns list with each element a string for a given line.
    - file_obj.read(): takes entire file in as one string.

#### .readline()

In [None]:
file_obj.readline()

When you want to process line-by-line:
- When you have huge file on disk
- Helpful with memory usage

To go back to beginning:

In [None]:
file_obj.seek(0)

#### .readlines()

In [None]:
content = file_obj.readlines()
content

In [None]:
print(type(content))
content[0:4]

#### .read()

In [None]:
file_obj.read()

What happened?

In [None]:
file_obj.seek(0)
file_obj.read()

#### 4. Closing the File

Once you are finished accessing file:
- Close communication channel to file!
- Can cause problems if you dont do this. 
- Easy to forget to close.
- `.close` method.

In [None]:
file_obj.close()

This means that your Python code is fully disconnected from the file. If you try to read from the file object again, you will get an error message:

In [None]:
file_obj.readlines()

File is closed but content in memory.

- Analyze the data using Python.
- E.g., count how many lines of that document contain the phrase "is better than":

In [None]:
is_better_than_count = 0
for line in content:
    if "is better than" in line:
        is_better_than_count += 1
        
print("The phrase 'is better than' appears", is_better_than_count, "times")

#### 5. The context manager


Use this almost always:
- Construction handles opening/closing automatically
- Shortened syntax Using `with`

In [None]:
file_obj = open(file_path)
file_contents = file_obj.readlines()
file_obj.close()

In [None]:
with open(file_path) as file_obj:
    file_contents = file_obj.readlines()

#### Writing to Files

Save information in memory in Python to a file on disk: file write. 

Example flow:
1. Read raw file into memory. 
2. Process data (list of strings, etc.)
3. Save processed data to new file on disk.



The process for reading the data has already been completed, and each line is stored in the variable `file_contents`:

In [None]:
file_contents

Then this code will create a cleaned version:

In [None]:
import string
file_contents_cleaned = []
for line in file_contents:
    words = line.split()
    cleaned_words = [word.strip(string.punctuation).lower() for word in words]
    cleaned_line = " ".join(cleaned_words) + "\n"
    file_contents_cleaned.append(cleaned_line)
file_contents_cleaned

Check if `zen_of_python_cleaned.txt` exists.

In [None]:
os.listdir()

#### To write file:

1. open(path, mode = 'w) for write mode
2. .write() method on file object

In [None]:
file_contents_cleaned

In [None]:
output_string = "".join(file_contents_cleaned)

In [None]:
with open("zen_of_python_cleaned.txt", mode="w") as output_file_obj:
    output_file_obj.write(output_string)

**If you are ever wondering why your data has not actually been saved to a file, double-check that the file has been closed, or that the buffer has been cleared in some other way.**

- Close files when you are finished with them whether the files are used for reading or writing
- Writing is where failing to close a file can cause more significant problems.
- Make your life less stressful with the context manager.

Next, we'll be using what we learned to open and read from/write to common data file formats:
- CSV
- JSON
- Excel (xls)
- Images (jpg, gif, etc)
- Pickling (.pkl)