<hr>
<div style="background-color: lightgray; padding: 20px; color: black;">
<div>
<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/9/97/Coursera-Logo_600x600.svg/1024px-Coursera-Logo_600x600.svg.png" style="float: right; margin-right: 30px;" width="120"/> 
<font size="6.5" color="#0056D2"><b>Composing File and Data Solutions</b></font> <br>
<font size="5.5" color="#0056D2"><b>Working with Data in Python </b></font> 
</div>
<div style="text-align: left">  <br>
Edison David Serrano Cárdenas. <br>
MSc in Applied Mathematics <br>
CIMAT - Sede Guanajuato <br>
</div>

</div>
<hr>

##  <font color="#0056D2" >**Objetives**</font> 
In this module, you will learn how to effectively use Python’s data structures to load, persist, and iterate over data. You will apply these data structures to solve different problems when working with popular data formats like JSON.

Load Packages:

In [7]:
import os

# <font color="#0056D2" >**Exploring Data Structures in Python**</font> 

<font color="#0056D2" >**Using Lists to Save and Retrieve Data in Python**</font> 

In [12]:
list_names = ["Pablo","Oscar"]
list_names.insert(0,"David")
print("List after insert:\t ",list_names)
print("Pablo index in list:\t ",list_names.index('Pablo'))

directories = os.listdir('..')
print("Files in main folder:\t ",directories)

List after insert:	  ['David', 'Pablo', 'Oscar']
Pablo index in list:	  1
Files in main folder:	  ['week1', '.git', 'README.md', 'LICENSE']


Using index with a non-existent name generate a ValueError

In [13]:
list_names.index("Alex")

ValueError: 'Alex' is not in list

<font color="#0056D2" >**Using Dictionaries to Save and Retrieve Data in Python**</font> 

In [None]:
contacts = {"name": "Alfredo", "lastname": "Deza"}
contacts.get("phone","Unkown")


'Unkown'

In [40]:
try:
  contacts['John']
except KeyError:
  print("Peter")

Peter


In [16]:
contacts.keys(), contacts.values()

(dict_keys(['name', 'lastname']), dict_values(['Alfredo', 'Deza']))

In [17]:
contacts["phone"]= "678-600-1111"
print(contacts)

{'name': 'Alfredo', 'lastname': 'Deza', 'phone': '678-600-1111'}


<font color="#0056D2" >**Overview of Less Common Data Structures in Python**</font> 

In [24]:
unique = set()
unique.add(4)
unique.add(1)
s = unique.pop()
print(unique, s)

{4} 1


<font color="#0056D2" >**Overview of Less Common Data Structures in Python**</font> 

In [27]:
contacts = {"Alfredo": "alfredo@example.org", "Kennedy": "kennedy@example.org", "Noah": "noah@example.org"}
for name, email in contacts. items():
    print(name, email)


Alfredo alfredo@example.org
Kennedy kennedy@example.org
Noah noah@example.org


<font color="#0056D2" >**Storing Data Between Data Structures in Python**</font> 



In [32]:
home_items = os.listdir('/workspaces/Scripting-with-Python-and-SQL-for-Data-Engineering/')
home_content = {"files":[],"directories":[]}

home_paths = [os.path.join('/workspaces/Scripting-with-Python-and-SQL-for-Data-Engineering/',item) for item in home_items]

for path in home_paths:
    if os.path.isdir(path):
        home_content['directories'].append(path)
    if os.path.isfile(path):
        home_content['files'].append(path)

print(home_content)

{'files': ['/workspaces/Scripting-with-Python-and-SQL-for-Data-Engineering/README.md', '/workspaces/Scripting-with-Python-and-SQL-for-Data-Engineering/LICENSE'], 'directories': ['/workspaces/Scripting-with-Python-and-SQL-for-Data-Engineering/week1', '/workspaces/Scripting-with-Python-and-SQL-for-Data-Engineering/.git']}


In [34]:
for item in home_content['files']:
    print(item)

/workspaces/Scripting-with-Python-and-SQL-for-Data-Engineering/README.md
/workspaces/Scripting-with-Python-and-SQL-for-Data-Engineering/LICENSE


<font color="#0056D2" >**Walking the filesystem, inspecting files**</font> 




In [44]:
import os

# yields the 'current' dir, then the directories, and then any files it finds
# for each level it traverses
for path_info in os.walk('..'):
    print(path_info)
    break
    

('..', ['week1', '.git'], ['README.md', 'LICENSE'])


In [47]:
import os
from os.path import abspath, join


# producing absolute paths, instead of a tuple of three items
for top_dir, directories, files in os.walk('..'):
    for directory in directories:
        print(abspath(join(top_dir, directory)))
    print("\n")
    for _file in files:
        print(abspath(join(top_dir, _file)))
    break

/workspaces/Scripting-with-Python-and-SQL-for-Data-Engineering/week1
/workspaces/Scripting-with-Python-and-SQL-for-Data-Engineering/.git


/workspaces/Scripting-with-Python-and-SQL-for-Data-Engineering/README.md
/workspaces/Scripting-with-Python-and-SQL-for-Data-Engineering/LICENSE


In [49]:
# Now that absolute paths are shown, we can inspect them for file metadata

import os
from os.path import abspath, join, getsize

sizes = {}

for top_dir, directories, files in os.walk('..'):
    for _file in files:
        full_path = abspath(join(top_dir, _file))
        size = getsize(full_path)
        sizes[full_path] = size
        #break

sorted_results = sorted(sizes, key=sizes.get, reverse=True)


for path in sorted_results[:10]:
    print("Path: {0}, size: {1}".format(path, sizes[path]))

Path: /workspaces/Scripting-with-Python-and-SQL-for-Data-Engineering/LICENSE, size: 35149
Path: /workspaces/Scripting-with-Python-and-SQL-for-Data-Engineering/.git/objects/pack/pack-d85bdce49633cef446c881a006743750d3371cfb.pack, size: 14347
Path: /workspaces/Scripting-with-Python-and-SQL-for-Data-Engineering/week1/notes_week1.ipynb, size: 11559
Path: /workspaces/Scripting-with-Python-and-SQL-for-Data-Engineering/.git/hooks/pre-rebase.sample, size: 4898
Path: /workspaces/Scripting-with-Python-and-SQL-for-Data-Engineering/.git/hooks/fsmonitor-watchman.sample, size: 4726
Path: /workspaces/Scripting-with-Python-and-SQL-for-Data-Engineering/.git/hooks/update.sample, size: 3650
Path: /workspaces/Scripting-with-Python-and-SQL-for-Data-Engineering/README.md, size: 3501
Path: /workspaces/Scripting-with-Python-and-SQL-for-Data-Engineering/.git/hooks/push-to-checkout.sample, size: 2783
Path: /workspaces/Scripting-with-Python-and-SQL-for-Data-Engineering/.git/hooks/sendemail-validate.sample, size:

## <font color="#0056D2" >**Introduction to Data Sources and Formats in Python**</font> 

<font color="#0056D2" >**Loading Data from Files and File Paths in Python**</font> 