<a href="https://colab.research.google.com/github/jindaldisha/Data-Analysis-with-Python/blob/main/python_filesystem.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Reading from and Writing to a file using Python

In [1]:
import os

In [2]:
#Check the present working directory
os.getcwd()

'/content'

In [3]:
#return a list containing names of files in a directory
os.listdir('.') # relative path

['.config', 'sample_data']

In [4]:
os.listdir('/usr') # absolute path

['local',
 'include',
 'games',
 'sbin',
 'lib',
 'share',
 'bin',
 'src',
 'grte',
 'lib32']

In [5]:
#Creating a new directory
os.makedirs('./data', exist_ok=True)

In [6]:
os.listdir('.')

['.config', 'data', 'sample_data']

In [7]:
os.listdir('./data')

[]

In [8]:
#downloading some data
url1 = 'https://gist.githubusercontent.com/aakashns/257f6e6c8719c17d0e498ea287d1a386/raw/7def9ef4234ddf0bc82f855ad67dac8b971852ef/loans1.txt'
url2 = 'https://gist.githubusercontent.com/aakashns/257f6e6c8719c17d0e498ea287d1a386/raw/7def9ef4234ddf0bc82f855ad67dac8b971852ef/loans2.txt'
url3 = 'https://gist.githubusercontent.com/aakashns/257f6e6c8719c17d0e498ea287d1a386/raw/7def9ef4234ddf0bc82f855ad67dac8b971852ef/loans3.txt'

In [9]:
#We use urlretrieve to download files into our directory
from urllib.request import urlretrieve

In [10]:
urlretrieve(url1, './data/loans1.txt')
urlretrieve(url2, './data/loans2.txt')
urlretrieve(url3, './data/loans3.txt')

('./data/loans3.txt', <http.client.HTTPMessage at 0x7fa7052a7290>)

In [11]:
os.listdir('./data')

['loans3.txt', 'loans1.txt', 'loans2.txt']

To read from a file we first have to open it, and then after we're done interacting with it we have to close it too, otherwise it constinues hoaging memory.

In [12]:
#Reading from a file
file1 = open('./data/loans1.txt', mode='r')

In [13]:
#Viewing content of a file
file1_content = file1.read()

In [14]:
print(file1_content)

amount,duration,rate,down_payment
100000,36,0.08,20000
200000,12,0.1,
628400,120,0.12,100000
4637400,240,0.06,
42900,90,0.07,8900
916000,16,0.13,
45230,48,0.08,4300
991360,99,0.08,
423000,27,0.09,47200


In [15]:
file1.close()

**Processing Data from a file**

Before performing any operations on the data stored in a file, we need to convert the file's contents from one large string into Python data types. We can do the following:

- Read the file line by line
- Parse the first line to get a list of the column names or headers
- Split each remaining line and convert each value into a float
- Create a dictionary for each loan using the headers as keys
- Create a list of dictionaries to keep track of all the loans

We can also do the above process of opening then reading and then closing a file with as follows. This automatically closes the file after you're done processing with it

In [16]:
#Function to open the file, read its content and then close it
def read_file(file_path):
  with open(file_path) as file2:
    file_content = file2.readlines()
  return file_content

In [17]:
file2_content = read_file('./data/loans2.txt')
file2_content

['amount,duration,rate,down_payment\n',
 '828400,120,0.11,100000\n',
 '4633400,240,0.06,\n',
 '42900,90,0.08,8900\n',
 '983000,16,0.14,\n',
 '15230,48,0.07,4300']

In [18]:
file3_content = read_file('./data/loans3.txt')
file3_content

['amount,duration,rate,down_payment\n',
 '45230,48,0.07,4300\n',
 '883000,16,0.14,\n',
 '100000,12,0.1,\n',
 '728400,120,0.12,100000\n',
 '3637400,240,0.06,\n',
 '82900,90,0.07,8900\n',
 '316000,16,0.13,\n',
 '15230,48,0.08,4300\n',
 '991360,99,0.08,\n',
 '323000,27,0.09,4720010000,36,0.08,20000\n',
 '528400,120,0.11,100000\n',
 '8633400,240,0.06,\n',
 '12900,90,0.08,8900']

In [19]:
#Function to split an input line as a list of column headers.
# .strip() removes extra spaces and escape characters
# .split() splits the data at the seperator that we passed as an input parameter
def parse_header(header_line):
  return header_line.strip().split(',')

In [20]:
file3_content[0]

'amount,duration,rate,down_payment\n'

In [21]:
headers = parse_header(file3_content[0])
headers

['amount', 'duration', 'rate', 'down_payment']

In [22]:
#Function that takes a data line as an argument and returns a list containing floating point numbers.
def parse_value(data_line):
    data_line = data_line.strip().split(',')
    values = []
    for item in data_line:
      if item == '':
        values.append(0.0)
      else:
        try:
            values.append(float(item))
        except ValueError:
            values.append(item)
    return values


In [23]:
file3_content[2]

'883000,16,0.14,\n'

In [24]:
temp = parse_value(file3_content[2])
temp

[883000.0, 16.0, 0.14, 0.0]

In [25]:
#Function to create dictionary items of the data
def create_dict_item(values, headers):
  result = {}
  for value, header in zip(values, headers):
    result[header] = value
  return result

In [26]:
headers

['amount', 'duration', 'rate', 'down_payment']

In [27]:
temp

[883000.0, 16.0, 0.14, 0.0]

In [28]:
create_dict_item(temp, headers)

{'amount': 883000.0, 'down_payment': 0.0, 'duration': 16.0, 'rate': 0.14}

In [29]:
#Putting together the read_file, parse_header, parse_value, create_dict_item functions and defining a new function
def read_csv(file_path):
  result = []
  #Reading lines of a file
  lines = read_file(file_path)
  #Parse the header
  header = parse_header(lines[0])
  #Looping over the remaining lines
  for line in lines[1:]:
    #Parse values
    values = parse_value(line)
    #Creating a dictionary of values and headers
    item_dict = create_dict_item(values, headers)
    #Add the dictionary item to the result
    result.append(item_dict)
  return result

In [30]:
#Testing it
loan3_dict = read_csv('./data/loans3.txt')
loan3_dict

[{'amount': 45230.0, 'down_payment': 4300.0, 'duration': 48.0, 'rate': 0.07},
 {'amount': 883000.0, 'down_payment': 0.0, 'duration': 16.0, 'rate': 0.14},
 {'amount': 100000.0, 'down_payment': 0.0, 'duration': 12.0, 'rate': 0.1},
 {'amount': 728400.0,
  'down_payment': 100000.0,
  'duration': 120.0,
  'rate': 0.12},
 {'amount': 3637400.0, 'down_payment': 0.0, 'duration': 240.0, 'rate': 0.06},
 {'amount': 82900.0, 'down_payment': 8900.0, 'duration': 90.0, 'rate': 0.07},
 {'amount': 316000.0, 'down_payment': 0.0, 'duration': 16.0, 'rate': 0.13},
 {'amount': 15230.0, 'down_payment': 4300.0, 'duration': 48.0, 'rate': 0.08},
 {'amount': 991360.0, 'down_payment': 0.0, 'duration': 99.0, 'rate': 0.08},
 {'amount': 323000.0,
  'down_payment': 4720010000.0,
  'duration': 27.0,
  'rate': 0.09},
 {'amount': 528400.0,
  'down_payment': 100000.0,
  'duration': 120.0,
  'rate': 0.11},
 {'amount': 8633400.0, 'down_payment': 0.0, 'duration': 240.0, 'rate': 0.06},
 {'amount': 12900.0, 'down_payment': 890

To read a csv file and turn its rows into dictionary items.

In [31]:
def read_file(file_path):
  with open(file_path) as file2:
    file_content = file2.readlines()
  return file_content

def parse_header(header_line):
  return header_line.strip().split(',')

def parse_value(data_line):
    data_line = data_line.strip().split(',')
    values = []
    for item in data_line:
      if item == '':
        values.append(0.0)
      else:
        try:
            values.append(float(item))
        except ValueError:
            values.append(item)
    return values

def create_dict_item(values, headers):
  result = {}
  for value, header in zip(values, headers):
    result[header] = value
  return result


def read_csv(file_path):
  result = []
  #Reading lines of a file
  lines = read_file(file_path)
  #Parse the header
  header = parse_header(lines[0])
  #Looping over the remaining lines
  for line in lines[1:]:
    #Parse values
    values = parse_value(line)
    #Creating a dictionary of values and headers
    item_dict = create_dict_item(values, headers)
    #Add the dictionary item to the result
    result.append(item_dict)
  return result

In [32]:
#Writing to files
loans2_dict = read_csv('./data/loans2.txt')
loans2_dict

[{'amount': 828400.0,
  'down_payment': 100000.0,
  'duration': 120.0,
  'rate': 0.11},
 {'amount': 4633400.0, 'down_payment': 0.0, 'duration': 240.0, 'rate': 0.06},
 {'amount': 42900.0, 'down_payment': 8900.0, 'duration': 90.0, 'rate': 0.08},
 {'amount': 983000.0, 'down_payment': 0.0, 'duration': 16.0, 'rate': 0.14},
 {'amount': 15230.0, 'down_payment': 4300.0, 'duration': 48.0, 'rate': 0.07}]

In [48]:
 with open('./data/output.txt', 'w') as f:
   for val in loans2_dict:
     f.write(f"{val['amount']},{val['down_payment']},{val['duration']},{val['rate']}\n")

In [49]:
os.listdir('data')

['output.txt', 'loans3.txt', 'loans1.txt', 'loans2.txt']

In [51]:
with open('./data/output.txt', 'r') as x:
          print(x.read())

828400.0,100000.0,120.0,0.11
4633400.0,0.0,240.0,0.06
42900.0,8900.0,90.0,0.08
983000.0,0.0,16.0,0.14
15230.0,4300.0,48.0,0.07



In [75]:
#Function to write to a file
def write_csv(items, path):
  #Open the file in write mode
  with open(path, 'w') as x:
    #If there is nothing to write, return
    if len(items) == 0:
      return

    #Write the header in first line
    headers = list(items[0].keys())
    x.write(','.join(headers) + '\n')

    #Write one item per line
    for item in items:
      values = []
      curr = list(item.values())
      for a in curr:
        values.append(str(a))
      x.write(','.join(values) + "\n")

In [76]:
write_csv(loans2_dict, './data/output2.txt')

In [77]:
os.listdir('data')

['output.txt', 'loans3.txt', 'loans1.txt', 'loans2.txt', 'output2.txt']

In [78]:
with open('./data/output2.txt', 'r') as x:
          print(x.read())

amount,duration,rate,down_payment
828400.0,120.0,0.11,100000.0
4633400.0,240.0,0.06,0.0
42900.0,90.0,0.08,8900.0
983000.0,16.0,0.14,0.0
15230.0,48.0,0.07,4300.0

