# Files and Folder using Python 

#### As we create and tap files for analysis, we need to stay organized programmatically.

- Let's understand Google Colab storage structure.

- We'll use the ```os module``` and ```pathlib``` to create, navigate and delete files and folders programmatically.

- We'll also use command line/UNIX commands like ```ls```, ```cd``` and ```mkdir```.

- [Download the sample files](https://drive.google.com/file/d/1uTbYtyV_1QBLvW0yUKaDRuk8Bgjhl3Er/view?usp=sharing) we will need.

In [14]:
## import libraries
import os  ## allows you to navigate, create, delete folders
from pathlib import Path ## allows to create paths to files and folders
import shutil ## To empty a directory with files in it, we use another library called shutil
# from google.colab import files ## code for downloading in google colab
import glob ## import the glob library for collecting specific files into a list


## UNIX Command Line

### NOTE: these commands have to be in empty cells

## Where am I?

## ```pwd```

In [4]:

pwd

'/Users/sandeep.junnarkar/Dropbox/coding/courses/fall21-students-practical-python/in-class'

## list directories

## ```ls```

In [2]:
ls

week-1-working-with-datatypes-BLANKS.ipynb
week-2-inclass-exercises-BLANKS.ipynb
week-2-inclass-exercises-DEMO.ipynb
week-3-A-list comprehensions BLANKS.ipynb
week-3-A-list comprehensions DEMO.ipynb
week-3-B-defined-functions-BLANKS.ipynb
week-3-B-defined-functions-DEMO.ipynb
week-3-C-Lambdas-BLANK.ipynb
week-4-scraping-intro-beautifulsoup-BLANKS.ipynb
week-4-scraping-intro-beautifulsoup-DEMO.ipynb
week-5-A-single-page-table_BLANKS.ipynb
week-5-A-single-page-table_DEMO.ipynb
week-5-B-inclass-non-tabular-scrape_BLANKS.ipynb
week-5-B-inclass-non-tabular-scrape_DEMO.ipynb
week-6-multi-page-table-scrape-BLANK.ipynb
week-6-multi-page-table-scrape-DEMO.ipynb
week-7-download_docs_BLANK.ipynb
week-7-download_docs_DEMO.ipynb
week-7-flattening_lists_BLANK.ipynb
week-7-flattening_lists_DEMO.ipynb
[1m[36mweek-8-sample-folder[m[m/
week-8-sample-folder.zip
week-8A-file_folder_mngmt_BLANK.ipynb
week-8A-file_folder_mngmt_DEMO.ipynb
week-8B-download-and-read_BLANK.ipynb


## change directories

## ```cd```

let's enter our ```sample_data``` folder

In [5]:
pwd

'/Users/sandeep.junnarkar/Dropbox/coding/courses/fall21-students-practical-python/in-class'

In [6]:
cd ..


/Users/sandeep.junnarkar/Dropbox/coding/courses/fall21-students-practical-python


## What does this folder hold?

In [7]:
ls

LICENSE                      [1m[36mhomework[m[m/
README.md                    [1m[36min-class[m[m/
[1m[36madvanced scraping workshops[m[m/


In [8]:
cd in-class/

/Users/sandeep.junnarkar/Dropbox/coding/courses/fall21-students-practical-python/in-class


## Back out of folder to the root folder

```cd ..```

In [9]:
cd ~

/Users/sandeep.junnarkar


Where am I?

In [10]:
pwd

'/Users/sandeep.junnarkar'

In [11]:
ls


[1m[36mApplications[m[m/         [1m[36mDropbox[m[m/              [1m[36mPublic[m[m/
[1m[36mCreative Cloud Files[m[m/ [1m[36mLibrary[m[m/              [1m[36mwebsites[m[m/
[35mDesktop[m[m@              [1m[36mMovies[m[m/               [1m[36mweek-8-sample-folder[m[m/
[35mDocuments[m[m@            [1m[36mMusic[m[m/
[35mDownloads[m[m@            [1m[36mPictures[m[m/


In [12]:
cd week-8-sample-folder/

/Users/sandeep.junnarkar/week-8-sample-folder


In [13]:
ls

alien-invasion.jpg  alien-invasion.png  delays.txt          manager.txt
alien-invasion.pdf  ceo_bios.csv        energy.csv          password.png


# Programmatic Folder/Files Management

- We'll use the ```os module```.

In [15]:
## Python scriptable
os.listdir()

['ceo_bios.csv',
 'password.png',
 'energy.csv',
 'alien-invasion.jpg',
 'alien-invasion.png',
 'alien-invasion.pdf',
 'manager.txt',
 'delays.txt']

In [16]:
## what object is that?
type(os.listdir())

list

In [17]:
## create a path to folder called some_new_folder
## we store that path in a variable called my_new_directory
my_new_directory = Path("some_new_folder")

In [18]:
## create that directory
## exists_ok=True checks to see if the folder already exists
my_new_directory.mkdir(exist_ok=True)

### You don't have to create a variable for the path, but it is easier to resuse that path
```Path('folder_name/').mkdir(exist_ok=True)```

In [19]:
### create junk_folder
my_new_directory = Path("junk_folder")
my_new_directory.mkdir(exist_ok = True)


UNIX command to show list of folders

In [20]:
ls

alien-invasion.jpg  ceo_bios.csv        [1m[36mjunk_folder[m[m/        [1m[36msome_new_folder[m[m/
alien-invasion.pdf  delays.txt          manager.txt
alien-invasion.png  energy.csv          password.png


In [21]:
## show list programmatically
os.listdir()

['.DS_Store',
 'junk_folder',
 'ceo_bios.csv',
 'some_new_folder',
 'password.png',
 'energy.csv',
 'alien-invasion.jpg',
 'alien-invasion.png',
 'alien-invasion.pdf',
 'manager.txt',
 'delays.txt']

## let's delete a folder

In [None]:
## remove an empty directory
## NOTE: This only removes empty directories

In [22]:
rmdir some_new_folder/

In [25]:
ls

alien-invasion.jpg  ceo_bios.csv        [1m[36mjunk_folder[m[m/
alien-invasion.pdf  delays.txt          manager.txt
alien-invasion.png  energy.csv          password.png


In [24]:
## show directory now programmatically
os.listdir()

['.DS_Store',
 'junk_folder',
 'ceo_bios.csv',
 'password.png',
 'energy.csv',
 'alien-invasion.jpg',
 'alien-invasion.png',
 'alien-invasion.pdf',
 'manager.txt',
 'delays.txt']

## Manually add some junk to the junk folder and check its content.

Only then do the next step

In [26]:
rmdir junk_folder/

rmdir: junk_folder/: Directory not empty


In [29]:
os.chdir("..")

In [32]:
cd week-8-sample-folder/

/Users/sandeep.junnarkar/week-8-sample-folder


## Delete junk_folder (this will break)

In [33]:
shutil.rmtree("junk_folder")

In [34]:
## show directory now USING OS
os.listdir()

['.DS_Store',
 'ceo_bios.csv',
 'password.png',
 'energy.csv',
 'alien-invasion.jpg',
 'alien-invasion.png',
 'alien-invasion.pdf',
 'manager.txt',
 'delays.txt']

## back out of directory because you can't delete a folder while you're in it!

In [None]:
## show directory now USING OS


In [None]:
## Now delete all contents


In [None]:
## show directory now USING OS


## Zip folder and download using UNIX commands

In [None]:
## Use colab to download


# Take a detour to fix last week's download issue.

# glob

## Yes, glob.

glob is a UNIX-based library for collecting specific files into a list.

## Using a path

We can store our path structure to a variable.

Right-click on the folder in the left column and copy path:
```/content/sample_data```

This is the raw path. We are already in ```content``` so instead we want:
```sample_data``` plus what files we are looking for (let's say all csv files).

In [None]:
## grab only the csv files


In [35]:
pwd

'/Users/sandeep.junnarkar/week-8-sample-folder'

In [36]:
ls

alien-invasion.jpg  alien-invasion.png  delays.txt          manager.txt
alien-invasion.pdf  ceo_bios.csv        energy.csv          password.png


In [37]:
## grab only the .csv file(s)
my_csv_files = glob.glob("*.csv")
my_csv_files


['ceo_bios.csv', 'energy.csv']

In [39]:
my_txt_files = glob.glob("*.txt")
my_txt_files

['manager.txt', 'delays.txt']

In [44]:
## grab all the files alien 
x_files = glob.glob("alien-invasion*")
x_files

['alien-invasion.png', 'alien-invasion copy.jpg', 'alien-invasion.pdf']

In [46]:
all_files = glob.glob("*")
all_files

['invasion-alien.jpg',
 'ceo_bios.csv',
 'password.png',
 'energy.csv',
 'alien-1.png',
 'alien-invasion.png',
 'alien-invasion copy.jpg',
 'alien-invasion.pdf',
 'manager.txt',
 'delays.txt']

In [None]:
## show directory now


In [None]:
## make a new directory called project_a


In [None]:
## show directory now


In [None]:
## change directory into project_a


In [None]:
## show directory now


In [None]:
## upload all our files to it


In [None]:
## show directory now


# Start reading files

In [None]:
## create a text wrapper object by "reading" the 'read_sample1.txt' file
## remember we are already in the test folder


## We can interpret this ```<class '_io.TextIOWrapper'>``` to read the actual contents

In [None]:
## create a variable that holds our file name


In [None]:
## read and print entire file


In [None]:
## read and print 50 characters


## Saving file to memory
So far, we haven't saved the text. 
The content is only available inside ```with open```.
If we try to read the lines, outside the ```with open```, we'll get a ```ValueError: I/O operation on closed file.```

## We fix that my saving the myfile object inside a variable

In [None]:
## read hold the first 25 characters in a variable


In [None]:
## call the variable above


In [None]:
## read the first line into a variable


In [None]:
## call the variable above


In [None]:
## read the whole thing into a variable


In [None]:
## call the variable above


## It's more useful to save the text object inside a list. 
Remember, ```readlines()``` actually shows each line as part of a list.

In [None]:
## store entire text file in list



## We can then slice our list

In [None]:
## Show list item 3
