# **Working with Files** 📁



File Handling
The key function for working with files in Python is the open() function.

The open() function takes two parameters; filename, and mode.

There are four different methods (modes) for opening a file:

```
open(fileLocation,modeOfAccessingFiles)
```

* "r" - Read - Default value. Opens a file for reading, error if the file does not exist

* "a" - Append - Opens a file for appending, creates the file if it does not exist

* "w" - Write - Opens a file for writing, creates the file if it does not exist

* "x" - Create - Creates the specified file, returns an error if the file exists

<hr>

* "t" - Text - Default value. Text mode

* "b" - Binary - Binary mode (e.g. images)

In [None]:
# Creating a File with Open comand

f = open('abc.txt','a')     # Created a object
f.write('Good morning :) \n')    # Written Text into Object
f.close()                   # Closed the Object

In [None]:
# Reading the file 

f = open('abc.txt','r')     # Created a object
#d = repr(f.read()) 
#print(d)
listStr = (f.readlines())
f.close()

In [None]:
(listStr[3])

'Good morning :) \n'

In [None]:
f = open('abc.txt','w')     # Created a object
[f.write('Hello World\n') for i in range(5) ]    # Written Text into Object
f.close() 

In [None]:
# Reading the file 

f = open('abc.txt','r')     # Created a object
print(f.readlines())
f.close()

['Hello World\n', 'Hello World\n', 'Hello World\n', 'Hello World\n', 'Hello World\n']


# **Working with OS Module commands** 🌄

In [None]:
import os # Internal Library to working Dict

## OS Basic Functions 

In [None]:
os.listdir('sample_data/')           # Example of accessing os functions

['anscombe.json',
 'README.md',
 'california_housing_train.csv',
 'mnist_train_small.csv',
 'mnist_test.csv',
 'california_housing_test.csv']

In [None]:
os.getcwd()               # Example of accessing os functions

'/content'

In [None]:
os.chdir('..')                # to change working Dir

In [None]:
os.getcwd()

'/content'

In [None]:
os.mkdir('New Folder')    # To make new Dir

In [None]:
os.rmdir('New Folder')    # To remove whole Dir

In [None]:
os.remove('abc.txt')      # To remove File

## OS Walk

Python method walk() generates the file names in a directory tree by walking the tree either top-down or bottom-up.

In [None]:
import os 

In [None]:
for root, dirs, files in os.walk(".", topdown=False):
  #print(len(files))
  for name in files:
      print(os.path.join(root, name))
  for name in dirs:
    print(os.path.join(root, name))

./.config/configurations/config_default
./.config/logs/2022.08.03/20.20.58.507230.log
./.config/logs/2022.08.03/20.20.57.728033.log
./.config/logs/2022.08.03/20.20.37.810163.log
./.config/logs/2022.08.03/20.20.11.079200.log
./.config/logs/2022.08.03/20.19.49.687892.log
./.config/logs/2022.08.03/20.20.30.273467.log
./.config/logs/2022.08.03
./.config/.last_survey_prompt.yaml
./.config/active_config
./.config/gce
./.config/.last_update_check.json
./.config/config_sentinel
./.config/.last_opt_in_prompt.yaml
./.config/configurations
./.config/logs
./sample_data/anscombe.json
./sample_data/README.md
./sample_data/mnist_test.csv
./sample_data/california_housing_test.csv
./sample_data/california_housing_train.csv
./sample_data/mnist_train_small.csv
./.config
./sample_data


# **Getting hands on the Data** 👐

**California Housing**

This is a dataset obtained from the StatLib repository. Here is the included description:

S&P Letters Data
We collected information on the variables using all the block groups in California from the 1990 Cens us. In this sample a block group on average includes 1425.5 individuals living in a geographically co mpact area. Naturally, the geographical area included varies inversely with the population density. W e computed distances among the centroids of each block group as measured in latitude and longitude. W e excluded all the block groups reporting zero entries for the independent and dependent variables. T he final data contained 20,640 observations on 9 variables. The dependent variable is ln(median house value).

```
                               Bols    tols
INTERCEPT		       11.4939 275.7518
MEDIAN INCOME	       0.4790  45.7768
MEDIAN INCOME2	       -0.0166 -9.4841
MEDIAN INCOME3	       -0.0002 -1.9157
ln(MEDIAN AGE)	       0.1570  33.6123
ln(TOTAL ROOMS/ POPULATION)    -0.8582 -56.1280
ln(BEDROOMS/ POPULATION)       0.8043  38.0685
ln(POPULATION/ HOUSEHOLDS)     -0.4077 -20.8762
ln(HOUSEHOLDS)	       0.0477  13.0792
```

In [None]:
!wget "https://raw.githubusercontent.com/amansingh9097/California-housing-price-prediction/master/datasets/housing/housing.csv"

--2022-08-08 03:00:03--  https://raw.githubusercontent.com/amansingh9097/California-housing-price-prediction/master/datasets/housing/housing.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.109.133, 185.199.108.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.109.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1444170 (1.4M) [text/plain]
Saving to: ‘housing.csv’


2022-08-08 03:00:03 (24.9 MB/s) - ‘housing.csv’ saved [1444170/1444170]



In [None]:
import csv

In [None]:
with open('housing.csv','r') as f:  #
  data = csv.DictReader(f, skipinitialspace= True)
  data = list(data)

In [None]:
data[0].keys()

odict_keys(['longitude', 'latitude', 'housing_median_age', 'total_rooms', 'total_bedrooms', 'population', 'households', 'median_income', 'median_house_value', 'ocean_proximity'])

In [None]:
type(data)          # Getting the type of Data

list

In [None]:
type(data[0])       # Getting the type of Data at [0]

collections.OrderedDict

In [None]:
len(data)           # Getting the Lenght of the Data

20640

In [None]:
data[0].keys()      # Getting the Columns names

odict_keys(['longitude', 'latitude', 'housing_median_age', 'total_rooms', 'total_bedrooms', 'population', 'households', 'median_income', 'median_house_value', 'ocean_proximity'])

In [None]:
for i in data[:10]:
  print(i['housing_median_age'])

41.0
21.0
52.0
52.0
52.0
52.0
52.0
52.0
42.0
52.0


# **IN CLASS TASK** 💻

In [None]:
# TASK 1 - Create a File named 'test.txt' and Write 5 Lines on Data Science Trends

In [None]:
# TASK 2 - Read only first 20 Chars in from the created file 'test.txt'

In [None]:
# TASK 3 - Append more Data into the create 'test.txt' related to the weather conditions 

In [None]:
# Sub Task Create a folder using os command and save files inside with Data science information,
# Folder name : New Folder
# [name]-[time].txt // Naming of the file
# Get data from following link and write it into your files
# link - https://raw.githubusercontent.com/Ambatkar/Test_mnist/main/data/dummy.txt
# Remove all the 'a' with 'e' and replace `'` with `"`.

In [None]:
# TASK 4 - to get Highest housiing_median_income House
# and print the longitude and latitude of the house

In [None]:
# TASK 5 - Find Types ocean_proximity values

In [None]:
# TASK 6 - Count all the ocean_proximity values and print them

In [None]:
# TASK 7 - Count total rooms and Bedroom in California 
# and count total Person per Bedroom 

In [None]:
import requests as r
data = r.get("https://raw.githubusercontent.com/Ambatkar/Test_mnist/main/data/dummy.txt")
strGet = data.text

In [None]:
strGet

"It is a long established fact that a reader will be distracted by the readable content of a page when looking at its layout. The point of using Lorem Ipsum is that it has a more-or-less normal distribution of letters, as opposed to using 'Content here, content here', making it look like readable English. Many desktop publishing packages and web page editors now use Lorem Ipsum as their default model text, and a search for 'lorem ipsum' will uncover many web sites still in their infancy. Various versions have evolved over the years, sometimes by accident, sometimes on purpose (injected humour and the like).\n\n"

<Response [200]>