# Files and Folder using Python 

#### As we create and tap files for analysis, we need to stay organized programmatically.

- Let's understand Google Colab storage structure and how we access files.


- [Download the sample files](https://drive.google.com/file/d/1SyCRPxwF6svHUvBdNqmoLr6mmloaWcMd/view?usp=sharing) we will need.

In [1]:
## import libraries
# from google.colab import files ## code for downloading in google colab
import glob ## import the glob library for collecting specific files into a list


# Importing files

We can use an import library specific to Colab

## *WARNING*: These are temporary uploads. When you restart, you need to reupload.

```from google.colab import files```

```files.upload()```

Let's confirm where we are:

In [None]:
## UPLOAD FILE


### Let's see if it uploaded

# glob

## Yes, glob.

glob is a UNIX-based library for collecting specific files into a list.

## Using a path

We can store our path structure to a variable.

Right-click on the folder in the left column and copy path:
```/content/sample_data```

This is the raw path. We are already in ```content``` so instead we want:
```sample_data``` plus what files we are looking for (let's say all csv files).

In [2]:
## grab only the csv files
x = glob.glob("docs/fla_count_as_of_2020-08-19_time_12_31_00.csv")
x

['docs/fla_count_as_of_2020-08-19_time_12_31_00.csv']

In [3]:
x = glob.glob("docs/*.csv")
x

['docs/fla_count_as_of_2020-08-19_time_11_46_00.csv',
 'docs/fla_count_as_of_2020-08-19_time_12_16_00.csv',
 'docs/fla_count_as_of_2020-08-19_time_11_31_00.csv',
 'docs/fla_count_as_of_2020-08-19_time_12_31_00.csv',
 'docs/fla_count_as_of_2020-08-19_time_12_01_00.csv']

In [6]:
y = glob.glob("docs/*.pdf")
y

['docs/adolph-coors-2015.pdf',
 'docs/adolph-coors-2014.pdf',
 'docs/adolph-coors-2013.pdf',
 'docs/beer-coors-2013 copy.pdf']

In [7]:
## grab all the files! 
all_files = glob.glob("docs/*")
all_files

['docs/fla_count_as_of_2020-08-19_time_11_46_00.csv',
 'docs/fla_count_as_of_2020-08-19_time_12_16_00.csv',
 'docs/adolph-coors-2015.pdf',
 'docs/adolph-coors-2014.pdf',
 'docs/fla_count_as_of_2020-08-19_time_11_31_00.csv',
 'docs/adolph-coors-2013.pdf',
 'docs/fla_count_as_of_2020-08-19_time_12_31_00.csv',
 'docs/read_sample2.txt',
 'docs/read_sample1.txt',
 'docs/fla_count_as_of_2020-08-19_time_12_01_00.csv',
 'docs/beer-coors-2013 copy.pdf']

In [8]:
## grab txt files only

text_files = glob.glob("docs/*.txt")

text_files

['docs/read_sample2.txt', 'docs/read_sample1.txt']

# Start reading files

In [10]:
## create a text wrapper object by "reading" the 'read_sample1.txt' file
## remember we are already in the test folder
with open("docs/read_sample1.txt", "r") as my_text:
    print(type(my_text))

<class '_io.TextIOWrapper'>


## We can interpret this ```<class '_io.TextIOWrapper'>``` to read the actual contents

In [11]:
## create a variable that holds our file name
file_name = "docs/read_sample1.txt"

In [12]:
## read and print entire file
with open(file_name, "r") as my_text:
    print(my_text.read())

McDonald’s, Coca-Cola Hit Pause on Russia Amid Rising Backlash
By Leslie Patton and Brendan Case

McDonald’s Corp., Coca-Cola Co. and Starbucks Corp. are temporarily halting business operations in Russia amid an intensifying backlash since the invasion of Ukraine started nearly two weeks ago. 

The iconic U.S. brands, seen around the world as the face of U.S. capitalism, announced their decisions in a flurry of announcements on Tuesday afternoon, joining hundreds of other global companies that have halted work in Russia since the war began. PepsiCo Inc. said it would suspend soft drink sales in Russia but would continue to sell daily essentials such as milk and baby formula.



In [14]:
## read and print 50 characters
with open(file_name, "r") as my_text:
    print(my_text.read(60))

McDonald’s, Coca-Cola Hit Pause on Russia Amid Rising Backla


## Saving file to memory
So far, we haven't saved the text. 
The content is only available inside ```with open```.
If we try to read the lines, outside the ```with open```, we'll get a ```ValueError: I/O operation on closed file.```

In [15]:
print(my_text(60))

TypeError: '_io.TextIOWrapper' object is not callable

## We fix that my saving the myfile object inside a variable

In [16]:
## read hold the first 25 characters in a variable
with open(file_name, "r") as my_text:
    first_25 = my_text.read(25)
   

In [17]:
## call the variable above
first_25

'McDonald’s, Coca-Cola Hit'

In [19]:
## read the first line into a variable
with open(file_name, "r") as my_text:
    first_line = my_text.readline()

In [20]:
## call the variable above
first_line

'McDonald’s, Coca-Cola Hit Pause on Russia Amid Rising Backlash\n'

In [21]:
## read the whole thing into a variable
with open(file_name, "r") as my_text:
    all_text = my_text.read()

In [22]:
## call the variable above
all_text

'McDonald’s, Coca-Cola Hit Pause on Russia Amid Rising Backlash\nBy Leslie Patton and Brendan Case\n\nMcDonald’s Corp., Coca-Cola Co. and Starbucks Corp. are temporarily halting business operations in Russia amid an intensifying backlash since the invasion of Ukraine started nearly two weeks ago. \n\nThe iconic U.S. brands, seen around the world as the face of U.S. capitalism, announced their decisions in a flurry of announcements on Tuesday afternoon, joining hundreds of other global companies that have halted work in Russia since the war began. PepsiCo Inc. said it would suspend soft drink sales in Russia but would continue to sell daily essentials such as milk and baby formula.\n'

## It's more useful to save the text object inside a list. 
Remember, ```readlines()``` actually shows each line as part of a list.

In [23]:
## store entire text file in list
with open(file_name, "r") as my_text:
    all_text_list = my_text.readlines()

In [24]:
all_text_list

['McDonald’s, Coca-Cola Hit Pause on Russia Amid Rising Backlash\n',
 'By Leslie Patton and Brendan Case\n',
 '\n',
 'McDonald’s Corp., Coca-Cola Co. and Starbucks Corp. are temporarily halting business operations in Russia amid an intensifying backlash since the invasion of Ukraine started nearly two weeks ago. \n',
 '\n',
 'The iconic U.S. brands, seen around the world as the face of U.S. capitalism, announced their decisions in a flurry of announcements on Tuesday afternoon, joining hundreds of other global companies that have halted work in Russia since the war began. PepsiCo Inc. said it would suspend soft drink sales in Russia but would continue to sell daily essentials such as milk and baby formula.\n']


## We can then slice our list

In [25]:
## Show list item 3
headline = all_text_list[0]
headline

'McDonald’s, Coca-Cola Hit Pause on Russia Amid Rising Backlash\n'

In [27]:
byline = all_text_list[1]
byline

'By Leslie Patton and Brendan Case\n'