# Python Files

## Reading from files

You have seen how to read from a file with `pandas.read_csv` before, but python has a more low-level interface to read from any file.

Let’s write to a file

In [1]:
f = open('newfile.txt', 'w')   # Open 'newfile.txt' for writing
f.write('Concordia\n')           # '\n' adds a new line
f.write('Bootcamps!')
f.close()                      # Close the file 

Here

- The built-in function `open()` creates a file object for writing to.  
- Both `write()` and `close()` are methods of file objects.  

Where is the file that we’ve created?

Like the terminal shell, your running python instance has a concept of the pwd (present working directory).

We can access it with the jupyter `!` shell commands:

In [2]:
!pwd

'pwd' is not recognized as an internal or external command,
operable program or batch file.


However, we can also get it from python directly:

In [10]:
import os

# equivalent to ls
os.listdir()
# equivalent to pwd
os.getcwd()

'C:\\Users\\Simona\\Desktop\\Data Analysis\\concordia-bootcamp\\ds-data-visualization-P3-main\\ds-data-visualization-P3-main\\_lecture'

If a path is not specified, then this is where Python writes to.

Normally this is next to where the notebook file is.

We can specify a specific path to write the file to by putting it in front of the file:

In [11]:
f = open('C:\\Users\\Simona\\Desktop\\Data Analysis\\concordia-bootcamp\\ds-data-visualization-P3-main\\ds-data-visualization-P3-main\\_lecture\new_file.txt', 'r') #Will be different for each person - use your own path
f.close() 

OSError: [Errno 22] Invalid argument: 'C:\\Users\\Simona\\Desktop\\Data Analysis\\concordia-bootcamp\\ds-data-visualization-P3-main\\ds-data-visualization-P3-main\\_lecture\new_file.txt'

We can also use Python to read the contents of `newfile.txt` as follows:

In [None]:
f = open('newfile.txt', 'w')   # Open 'newfile.txt' for writing
f.write('Concordia\n')           # '\n' adds a new line
f.write('Bootcamps!')
f.close()                      # Close the file 

In [None]:
f = open('newfile.txt', 'r')
out = f.read()
out

In [None]:
print(out) # Notice the \n being read as a line now

In [None]:
# "with block" automatically closes th block when exiting
with open('cities.csv', 'w') as f:
    f.write(
    """city, population
    new york, 8244910
    los angeles, 3819702
    chicago, 2707120
    houston, 2145146
    philadelphia, 1536471
    phoenix, 1469471
    san antonio, 1359758
    san diego, 1326179
    dallas, 1223229""")

Note the `"""string"""` syntax here that lets you write multi-line strings

In [None]:
import pandas as pd
pd.read_csv("cities.csv")

In [None]:
import pandas as pd

# We could read this file with whatever extension
# as long as it's organized like a csv file
pd.read_csv('cities.csv')

# File types

Common operating systems use file extensions to tell programs how the file is organized. Here are a few common ones:

- `.txt` which are arbitrary text files. You can open them in text editing programs (VS Code, Sublime Text, Notepad, Micro, Vim, etc.)

- Some files are text files but have extensions to hint about how they're organized. For instance, a `.py` file is a text file which we hint contains python code.

- Files with extensions like `.exe` (in Windows) or `.bin` are **binary** files -- they're encoded directly as 1's and 0's for the operating system to read. (Try opening a `.exe` file in sublime text to see this)

### CSV files

A CSV file is a common kind of file used for data which is organized by **records** (one per line) with **fields** (separated by commas). Let's write one such file:

Many Python objects are “iterable”, in the sense that they can be looped over.

To give an example, let’s write the file **us_cities.txt**, which lists US cities and their population, to the present working directory.


In [None]:
data_file = open('cities.csv', 'r')

cities = []

lines = data_file.readlines()

for line in lines:
    fields = line.split(',')
    cities.append(fields)
    
data_file.close()

le_file = open('us_cities.txt','w')

for line in cities: 
    le_file.write(str([line[0].strip(),line[1].strip()]))

le_file.close()

Note that the file **header** (first line naming the columns) is read as well as all the `\n` characters

## Writing Python in a .py program

We can write a program directly in a `.py` file and run it using `python my_program.py` in the terminal.

Here we can define `if __name__ == '__main__':` as the block defining the **entry point**

And we can treat the `.py` file as text or as a python module

In [12]:
# "with block" automatically closes th block when exiting
with open('test.py', 'w') as f:
    f.write(
"""
import numpy as np

def double_square(x):
    return np.square(np.square(x))
""")

We can treat it as text:

In [13]:
with open('test.py', 'r') as f:
    print(f.read())


import numpy as np

def double_square(x):
    return np.square(np.square(x))



But we can also `import`  it as a python library!

In [14]:
import test

test.double_square(5)

625