# Backslash on Windows and Forward Slash on OS X and Linux
On Windows, paths are written using backslashes ( \ ) as the separator between folder names. OS X and Linux, however, use the forward slash ( / ) as their path separator. If you want your programs to work on all operating systems, you will have to write your Python scripts to handle both cases.

Fortunately, this is simple to do with the `os.path.join()` function. If you pass it the string values of individual file and folder names in your path, `os.path.join()` will return a string with a file path using the correct path separators. Enter

In [9]:
import os
os.path.join('user','bin','spam')

'user\\bin\\spam'

The `os.path.join()` function is helpful if you need to create strings for filenames.

In [8]:
myFiles = ['accounts.txt','details.csv','invite.docx']

In [10]:
for filename in myFiles:
  print(os.path.join('C:/user/bin/',filename))

C:/user/bin/accounts.txt
C:/user/bin/details.csv
C:/user/bin/invite.docx


# The Current Working Directory
You can get the current working directory as a string value with the `os.getcwd()` function and change it with `os.chdir()`

In [11]:
import os
os.getcwd()

'c:\\Users\\luis\\Downloads\\organizing_files'

In [12]:
!curl -o datasets.zip https://github.com/carloslme/intro-data-engineering/raw/main/02-python-fundamentals/task-automation/organize-files/datasets.zip


  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100  216k    0  216k    0     0   252k      0 --:--:-- --:--:-- --:--:--  252k
100  216k    0  216k    0     0   252k      0 --:--:-- --:--:-- --:--:--  252k


> **IMPORTANT!**  
Unzip the `datasets.zip file` manually

In [46]:
import os
import zipfile

zip_file_path = 'datasets.zip'
extracted_folder = 'extracted_data'

# Crear el directorio donde se extraerán los archivos si no existe
if not os.path.exists(extracted_folder):
    os.makedirs(extracted_folder)

# Descomprimir el archivo zip
with zipfile.ZipFile(zip_file_path, 'r') as zip_ref:
    zip_ref.extractall(extracted_folder)

# Cambiar al directorio donde se extrajeron los archivos
os.chdir(extracted_folder)

# Ahora estás dentro del directorio donde se extrajo el contenido del zip


BadZipFile: File is not a zip file

In [43]:
os.chdir('./datasets.zip')

NotADirectoryError: [WinError 267] El nombre del directorio no es válido: './datasets.zip'

In [17]:
os.getcwd()

'c:\\Users\\luis\\Downloads\\organizing_files'

If the folder does not exist

In [18]:
os.chdir('/ThisFolderDoesNotExist')

FileNotFoundError: [WinError 2] El sistema no puede encontrar el archivo especificado: '/ThisFolderDoesNotExist'

# Absolute vs. Relative Paths
There are two ways to specify a file path. 
* An absolute path , which always begins with the root folder 
* A relative path , which is relative to the program’s current working directory


There are also the dot (.) and dot-dot(..) folders. 
* A single period for a folder name is shorthand for "this directory".
* Two periods means "the parent folder".

# Creating New Folders with `os.makedirs()`
os.makedirs() will create any neccesary intermediate folders in order to ensure that the full path exists.


In [19]:
!pip install requests




[notice] A new release of pip is available: 23.0.1 -> 23.2.1
[notice] To update, run: C:\Users\luis\AppData\Local\Microsoft\WindowsApps\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\python.exe -m pip install --upgrade pip





In [20]:
	
import requests

url = "https://download.microsoft.com/download/3/E/1/3E1C3F21-ECDB-4869-8368-6DEBA77B919F/kagglecatsanddogs_5340.zip"
filename = "kagglecatsanddogs_5340.zip"

response = requests.get(url, stream=True)
if response.status_code == 200:
    with open(filename, "wb") as file:
        for chunk in response.iter_content(chunk_size=8192):
            file.write(chunk)
    print(f"Downloaded '{filename}' successfully.")
else:
    print(f"Failed to download '{filename}'. Status code: {response.status_code}")


Downloaded 'kagglecatsanddogs_5340.zip' successfully.


In [22]:
!curl -o kagglecatsanddogs_5340.zip "https://download.microsoft.com/download/3/E/1/3E1C3F21-ECDB-4869-8368-6DEBA77B919F/kagglecatsanddogs_5340.zip"


  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
  4  786M    4 32.0M    0     0  30.8M      0  0:00:25  0:00:01  0:00:24 30.8M
  8  786M    8 63.0M    0     0  31.0M      0  0:00:25  0:00:02  0:00:23 31.0M
 12  786M   12 97.2M    0     0  32.0M      0  0:00:24  0:00:03  0:00:21 32.0M
 16  786M   16  131M    0     0  32.5M      0  0:00:24  0:00:04  0:00:20 32.5M
 20  786M   20  164M    0     0  32.7M      0  0:00:24  0:00:05  0:00:19 32.9M
 25  786M   25  197M    0     0  32.7M      0  0:00:24  0:00:06  0:00:18 33.1M
 29  786M   29  232M    0     0  33.0M      0  0:00:23  0:00:07  0:00:16 33.8M
 34  786M   34  268M    0     0  33.3M      0  0:00:23  0:00:08  0:00:15 34.2M
 38  786M   38  301M    0     0  33.3M      0  0:00

In [23]:
import os
os.makedirs("test")

FileExistsError: [WinError 183] No se puede crear un archivo que ya existe: 'test'

In [24]:
import os
os.makedirs('./dummy_directories/parent/son/grandson')

In [26]:
!dir .\dummy_directories


 El volumen de la unidad C es BOOTCAMP
 El n�mero de serie del volumen es: CAE9-9A70

 Directorio de c:\Users\luis\Downloads\organizing_files\dummy_directories

24/08/2023  10:52 a. m.    <DIR>          .
24/08/2023  10:52 a. m.    <DIR>          ..
24/08/2023  10:52 a. m.    <DIR>          parent
               0 archivos              0 bytes
               3 dirs  109,776,609,280 bytes libres


In [28]:
!dir .\dummy_directories\parent

 El volumen de la unidad C es BOOTCAMP
 El n�mero de serie del volumen es: CAE9-9A70

 Directorio de c:\Users\luis\Downloads\organizing_files\dummy_directories\parent

24/08/2023  10:52 a. m.    <DIR>          .
24/08/2023  10:52 a. m.    <DIR>          ..
24/08/2023  10:52 a. m.    <DIR>          son
               0 archivos              0 bytes
               3 dirs  109,773,053,952 bytes libres


In [29]:
!dir .\dummy_directories\parent\son

 El volumen de la unidad C es BOOTCAMP
 El n�mero de serie del volumen es: CAE9-9A70

 Directorio de c:\Users\luis\Downloads\organizing_files\dummy_directories\parent\son

24/08/2023  10:52 a. m.    <DIR>          .
24/08/2023  10:52 a. m.    <DIR>          ..
24/08/2023  10:52 a. m.    <DIR>          grandson
               0 archivos              0 bytes
               3 dirs  109,761,658,880 bytes libres


# The `os.path` Module
The `os.path` module contains many helpful functions related to filenames and file paths.



## Handling Absolute and Relative Paths
* Calling `os.path.abspath(path)` will return a string of the absolute path of the argument. This is an easy way to convert a relative path into an absolute one. 
* Calling `os.path.isabs(path)` will return True if the argument is an absolute path and False if it is a relative path. 
* Calling `os.path.relpath(path,start)` will return a string of a relative path from the start path to path . If start is not provided, the current working directory is used as the start path.

In [30]:
os.path.abspath('.')

'c:\\Users\\luis\\Downloads\\organizing_files'

In [31]:
os.path.isabs('.')

False

In [32]:
os.path.relpath('/dummy_directories/parent','/dummy_directories/')

'parent'

The function below will calculate the relative path that you need to navigate from the base path to reach the target path. In this case, the target path is /dummy_directories/, and the base path is /dummy_directories/parent/son/grandson/. To go from the base path to the target path, you need to go two directories up and then one directory down. The resulting relative path will be '../../../dummy_directories/'.

Note that the relative path is calculated based on the directory structure, not the actual existence of directories. The function doesn't check whether the directories exist in the file system.

In [33]:
os.path.relpath('/dummy_directories/','/dummy_directories/parent/son/grandson/')

'..\\..\\..'

* Calling `os.path.dirname(path)` will return a string of everything that comes before the last slash in the path argument. 
* Calling `os.path.basename(path)` will return a string of everything that comes after the last slash in the path argument.

In [34]:
path = '/datasets/README.md'
os.path.basename(path)

'README.md'

In [35]:
os.path.dirname(path)

'/datasets'

`os.path.split()` is a nice shortcut if you need both values.

In [36]:
californiaFilePath = '/datasets/california_housing_test.csv'
os.path.split(californiaFilePath)

('/datasets', 'california_housing_test.csv')

`os.path.sep()` take a file path and return a list of strings of each folder.

In [37]:
californiaFilePath.split(os.path.sep)

['/datasets/california_housing_test.csv']

# Finding File Sizes and Folders Contents
The os.path module provides functions for finding the size of a file in bytes and the files and folders inside a given folder. 
* Calling `os.path.getsize(path)` will return the size in bytes of the file in the path argument. 
* Calling `os.listdir(path)` will return a list of filename strings for each file in the path argument. (Note that this function is in the os module, not `os.path` .)

In [38]:
!cd

c:\Users\luis\Downloads\organizing_files


In [47]:
import os

os.path.getsize('.california_housing_test.csv')

FileNotFoundError: [WinError 2] El sistema no puede encontrar el archivo especificado: './california_housing_test.csv'

In [48]:
os.listdir('./')

['.git',
 '.gitignore',
 '.secrets',
 'datasets.zip',
 'dummy_directories',
 'extracted_data',
 'kagglecatsanddogs_5340.zip',
 'Project_DE-E2E',
 'reading_writing_files.ipynb',
 'test',
 'venv']

In [49]:
# Getting total size of all the files in the directory
totalSize = 0
for filename in os.listdir('./'):
  totalSize = totalSize + os.path.getsize(os.path.join('./', filename))
print(totalSize)

825144102


# Checking Path Validity
The os.path module provides functions to check whether a given path exists and whether it is a file or folder. 
* Calling `os.path.exists(path)` will return True if the file or folder referred to in the argument exists and will return False if it does not exist.
* Calling `os.path.isfile(path)` will return True if the path argument exists and is a file and will return False otherwise. 
* Calling `os.path.isdir(path)` will return True if the path argument exists and is a folder and will return False otherwise.


In [50]:
import os

os.path.exists('./dummy_directories')

True

In [51]:
os.path.exists('./testssssss')

False

In [52]:
os.path.isdir('./dummy_directories/')

True

In [53]:
os.path.isfile('./dummy_directories')

False

In [54]:
os.path.isdir('anscombe.json')

False

In [55]:
os.path.exists('./test/anscombe.json')

False

# The File Reading/Writing Process
There are three steps to reading or writing files in Python. 

1.   Call the `open()` function to return a File object. 
2.   Call the `read(`) or `write()` method on the File object. 
3.   Close the file by calling the `close()` method on the File object.

## Opening Files with the `open()` Function
The `open()` function returns a File object.

---



In [56]:
nameFile = './hello.txt'
helloFile = open(nameFile,'w')

In [57]:
with open(nameFile,'a') as f:
  f.write('Hello World!')
  f.close()

In [58]:
with open(nameFile,'a') as f:
  f.writelines('\n-122.050000,37.370000,27.000000,3885.000000,661.000000,1537.000000,606.000000,6.608500,344700.000000')
  f.close()

# Reading the Contents of Files
If you want to read the entire contents of a file as a string value, use the File object’s `read()` method.

In [59]:
helloFile = open(nameFile,'r')
helloContent = helloFile.read()
print(helloContent)

Hello World!
-122.050000,37.370000,27.000000,3885.000000,661.000000,1537.000000,606.000000,6.608500,344700.000000


Alternatively, you can use the `readlines()` method to get a list of string values from the file, one string for each line of text.

In [60]:
with open('./connet29.txt','w') as s:
  s.write('When, in disgrace with fortune and men\'s eyes, \n I all alone beweep my outcast state, \n And trouble deaf heaven with my bootless cries, \n And look upon myself and curse my fate,')
  s.close()

In [61]:
sonnetFile = open('./connet29.txt')
sonnetFile.readlines()

["When, in disgrace with fortune and men's eyes, \n",
 ' I all alone beweep my outcast state, \n',
 ' And trouble deaf heaven with my bootless cries, \n',
 ' And look upon myself and curse my fate,']

# Writing to Files
Python allows you to write content to a file in a way similar to how the `print()` function “writes” strings to the screen. You can’t write to a file you’ve opened in read mode, though. Instead, you need to open it in “write plaintext” mode or “append plaintext” mode, or write mode and append mode for short.

* Pass `'a'` as the second argument to `open()` the file in append mode. Append mode will append text to the end of the existing file.
* Pass `'w'` as the second argument to `open()` to open the file in write mode. Write mode will overwrite the existing file and start from scratch.

If the finename passed to `open()` does not exist, both write and append mode will create a new, blank file.

Call the `close()` method before opening the file again.

In [62]:
# Example 1
baconFile = open('bacon.txt','w')
baconFile.write('Hello world!\n')
baconFile.close()

In [63]:
# Example 1
with open('bacon.txt','w') as baconFile:
    baconFile.write('Hello world!\n')

In [64]:
baconFile = open('bacon.txt','a')
baconFile.write('Bacon is not a vegetable.')
baconFile.close()

In [65]:
baconFile = open('bacon.txt')
content = baconFile.read()
baconFile.close()
print(content)

Hello world!
Bacon is not a vegetable.
