# CSV/URL Read and Write/os.Walk Module Tutorial

- This guide will assist users in how to open, read and print the results of a CSV (using a list of datasets from the [Humanitarian Data Exchange](https://data.humdata.org/dataset)).
- The module `os.Walk` will be given special emphasis as this allows you to work down directories of your files and sift through multiple files.
- Including a brief display of the `pandas` module and Dataframe capabilities in displaying a CSV for records.

## 1. Import Modules

- List of modules that open and read a CSV, list data columns, table rows and show folder structure.

In [1]:
import csv # This module provides classes for reading and writing CSV files
import os # This module provides a way of using operating system dependent functionality. It provides a way of interacting with the file system, such as creating, moving, and deleting files and directories.
import pandas # This module provides fast, flexible, and expressive data structures designed to make working with "relational" or "labeled" data both easy and intuitive. It is widely used for data analysis, manipulation, and visualization.

## 2. Open and Read CSV (Humanitarian Data Example)

- The following opens a CSV based on three headers and prints the value cells in those columns and rows.

In [2]:
with open('HumanURLS.csv', mode='r') as csv_file: # Opens the CSV file for reading via 'r'
    csv_reader = csv.DictReader(csv_file) # The csv.DictReader(csv_file) statement creates a new CSV reader object to read the file and return rows a dictionary.
    line_count = 0 # Statement checks if this is the first row of the file, and if so, prints out the column names.
    for row in csv_reader:
        if line_count == 0: 
            print(f'Column names are {", ".join(row)}') 
            line_count += 1
        print(f'\t{row["Title"]} , {row["Alternative Title"]} , {row["Description"]}.')
        line_count += 1 # The line_count variable to keep track of how many lines have been processed
    print(f'Processed {line_count} lines.')

Column names are Title, Alternative Title, Description, Language, Creator, Resource Class, ISO Topic Categories, Keyword, Date Issued, Temporal Coverage, Date Range, Update Frequency, Spatial Coverage, Bounding Box, Resource Type, Format, Information, Download, ID, Identifier, Provider, Code, Member Of, Status, Accrual Method, Date Accessioned, Rights, License, Access Rights, Suppressed, Child Record, File Size
	Beirut port explosion operational zones , Beirut port explosion operational zones , Beirut port explosion operational zonesUPDATED 9 October 2020:Enlarged zones with UN HABITAT socio-economic classification and the 3Km priority of the Flash Appeal priority radius added.   Please see the methodology section in the metadata for further information.UPDATED 19 August 2020:11 operational zones numbered 129 to 139 inclusive added to the Sinn El-Fil cadastre south of the origional set.Zones spreadsheet added.NOTESUsers are referred to the compatible Beirut buildings footprints COD.The

## 3. Different Code Implmentation Example

- Similar to the above example, except the style is more coslidated and compact due to delimiter settings

In [3]:
with open('HumanURLS.csv', newline='') as csvfile:
    spamreader = csv.reader(csvfile, delimiter='|', quotechar='|') # The delimiter parameter specifies the character used to separate fields in the file, 
    for row in spamreader: # The quotechar parameter specifies the character used to quote fields that contain the delimiter character.
        print(', '.join(row))

Title,Alternative Title,Description,Language,Creator,Resource Class,ISO Topic Categories,Keyword,Date Issued,Temporal Coverage,Date Range,Update Frequency,Spatial Coverage,Bounding Box,Resource Type,Format,Information,Download,ID,Identifier,Provider,Code,Member Of,Status,Accrual Method,Date Accessioned,Rights,License,Access Rights,Suppressed,Child Record,File Size
Beirut port explosion operational zones,Beirut port explosion operational zones,Beirut port explosion operational zonesUPDATED 9 October 2020:Enlarged zones with UN HABITAT socio-economic classification and the 3Km priority of the Flash Appeal priority radius added.   Please see the methodology section in the metadata for further information.UPDATED 19 August 2020:11 operational zones numbered 129 to 139 inclusive added to the Sinn El-Fil cadastre south of the origional set.Zones spreadsheet added.NOTESUsers are referred to the compatible Beirut buildings footprints COD.The standard Administrative Boundary Common Operational 

## 4. Using Pandas Dataframes module for better readability

- Assign the CSV to be read in a Dataframe and partitioned with the Pandas module and then organized into a table style

In [4]:
df = pandas.read_csv('HumanURLS.csv')
print(df)# The print(df) statement then prints out the contents of the DataFrame object to the console.

                                               Title  \
0            Beirut port explosion operational zones   
1  Population potentially exposed to floods betwe...   
2  Satellite detected water extent as of 21 July ...   

                                   Alternative Title  \
0            Beirut port explosion operational zones   
1  Population potentially exposed to floods betwe...   
2  Satellite detected water extent as of 21 July ...   

                                         Description Language  \
0  Beirut port explosion operational zonesUPDATED...      eng   
1  UNOSAT code: FL20200713BGD  This map illustrat...      eng   
2  UNOSAT code: FL20200713BGD  This map illustrat...      eng   

                                             Creator Resource Class  \
0                                       OCHA Lebanon       Datasets   
1  UN Operational Satellite Applications Programm...       Datasets   
2  UN Operational Satellite Applications Programm...       Datasets   

   I

In [5]:
df # Cleaner/tabular format

Unnamed: 0,Title,Alternative Title,Description,Language,Creator,Resource Class,ISO Topic Categories,Keyword,Date Issued,Temporal Coverage,...,Member Of,Status,Accrual Method,Date Accessioned,Rights,License,Access Rights,Suppressed,Child Record,File Size
0,Beirut port explosion operational zones,Beirut port explosion operational zones,Beirut port explosion operational zonesUPDATED...,eng,OCHA Lebanon,Datasets,,common operational dataset - cod,10/9/20,"August 11, 2020-August 11, 2020",...,99-1400,Active,HTML,10/31/21,,Other: humanitarian use only,Public,False,False,37.3K
1,Population potentially exposed to floods betwe...,Population potentially exposed to floods betwe...,UNOSAT code: FL20200713BGD This map illustrat...,eng,UN Operational Satellite Applications Programm...,Datasets,,floods - storm surges|geodata,8/11/20,"July 24, 2020-July 24, 2020",...,99-1400,Active,HTML,10/31/21,,Creative Commons Attribution Share-Alike,Public,False,False,
2,Satellite detected water extent as of 21 July ...,Satellite detected water extent as of 21 July ...,UNOSAT code: FL20200713BGD This map illustrat...,eng,UN Operational Satellite Applications Programm...,Datasets,,floods - storm surges|geodata,7/24/20,"July 22, 2020-July 22, 2020",...,99-1400,Active,HTML,10/31/21,,Creative Commons Attribution Share-Alike,Public,False,False,


## 5. os.Walk File Walkthrough

- The `os.Walk` module gives an overview of all the main files within a file path, root, and directories and then lists what is inside each folder and progressing sub-folder.

In [6]:
if __name__ == "__main__":
    for (root,dirs,files) in os.walk('.', topdown=True): # Generates the file names in a directory tree by walking the tree either top-down or bottom-up.
        print (root)
        print (dirs)
        print (files)
        print ('<<<<<<<<<<->>>>>>>>>') # Seperator Line
        # the loop prints out the name of the directory (root), a list of its subdirectories (dirs), and a list of its files (files).

.
['.ipynb_checkpoints']
['.DS_Store', 'T-02_iterating-files.ipynb', 'HumanURLS.csv', 'CSV URLs OSWalk Tutorial .ipynb']
<<<<<<<<<<->>>>>>>>>
./.ipynb_checkpoints
[]
[]
<<<<<<<<<<->>>>>>>>>
