# Advanced Python

# Python File Handling: 

## Working with Text Files

In Python, we can effectively manage text files using the `open()` function. This function requires two essential parameters: the `filename` and the `mode`.

## File Modes:

There are four different modes available for opening a file:

1. `"r"` - Read Mode: This is the default mode, used for reading data from an existing file. An error is raised if the file does not exist.

2. `"a"` - Append Mode: This mode allows you to add data to the end of an existing file or create a new one if it doesn't exist.

3. `"w"` - Write Mode: In write mode, you can modify an existing file or create a new one. Be cautious as it truncates the file, erasing its contents.

4. `"x"` - Create Mode: This mode is used for creating a new file with the specified filename. An error is raised if the file already exists.

## File Handling Modes:

You can also specify whether the file should be handled in binary or text mode:

1. `"t"` - Text Mode: This is the default mode, and it handles the file as a text file.

2. `"b"` - Binary Mode: This mode handles the file as a binary file.

In [69]:
file_obj = open('sample-7766.txt', 'r')  

## Reading Text Files
### read()
By default the read() method returns the whole text

In [70]:
type(data)

str

In [50]:
# Open the file in read mode
file_obj = open('sample-7766.txt', 'r')

# Read the first 10 bytes from the file
data2 = file_obj.read(10)  # If the number of bytes is not given, it reads the entire file

# Print the content read from the file
print(data2)

# Close the file
file_obj.close()


Lorem ipsu


In [51]:
# Open the file in read mode
file_obj = open('sample-7766.txt', 'r')

# Read the first line from the file
print(file_obj.readline())

# Read the second line from the file
print(file_obj.readline())

# Read up to 10000 characters from the current line
# If the line has fewer than 10000 characters, it will return the entire line
print(file_obj.readline(10000))

# Close the file
file_obj.close()


Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vivamus condimentum sagittis lacus, laoreet luctus ligula laoreet ut. Vestibulum ullamcorper accumsan velit vel vehicula. Proin tempor lacus arcu. Nunc at elit condimentum, semper nisi et, condimentum mi. In venenatis blandit nibh at sollicitudin. Vestibulum dapibus mauris at orci maximus pellentesque. Nullam id elementum ipsum. Suspendisse cursus lobortis viverra. Proin et erat at mauris tincidunt porttitor vitae ac dui.

Donec vulputate lorem tortor, nec fermentum nibh bibendum vel. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Praesent dictum luctus massa, non euismod lacus. Pellentesque condimentum dolor est, ut dapibus lectus luctus ac. Ut sagittis commodo arcu. Integer nisi nulla, facilisis sit amet nulla quis, eleifend suscipit purus. Class aptent taciti sociosqu ad litora torquent per conubia nostra, per inceptos himenaeos. Aliquam euismod ultrices lorem, sit amet imperdiet est tincidunt vel. Phasellus dictum j

In [53]:
# Open the file in read mode
file_obj = open('sample-7766.txt', 'r')

# Read the entire file and store its lines in a list
file_data_in_list = file_obj.readlines()

# Close the file
file_obj.close()

# Print the type of the variable holding the file lines (should be list)
print(type(file_data_in_list))


<class 'list'>


In [54]:
print(len(file_data_in_list))

14


In [55]:
file_data_in_list[2]

'Nulla luctus sem sit amet nisi consequat, id ornare ipsum dignissim. Sed elementum elit nibh, eu condimentum orci viverra quis. Aenean suscipit vitae felis non suscipit. Suspendisse pharetra turpis non eros semper dictum. Etiam tincidunt venenatis venenatis. Praesent eget gravida lorem, ut congue diam. Etiam facilisis elit at porttitor egestas. Praesent consequat, velit non vulputate convallis, ligula diam sagittis urna, in venenatis nisi justo ut mauris. Vestibulum posuere sollicitudin mi, et vulputate nisl fringilla non. Nulla ornare pretium velit a euismod. Nunc sagittis venenatis vestibulum. Nunc sodales libero a est ornare ultricies. Sed sed leo sed orci pellentesque ultrices. Mauris sollicitudin, sem quis placerat ornare, velit arcu convallis ligula, pretium finibus nisl sapien vel sem. Vivamus sit amet tortor id lorem consequat hendrerit. Nullam at dui risus.\n'

In [57]:
# it is our responsiblity to close the file

file_obj = open('sample-7766.txt', 'r')
file_data_in_list =file_obj.readlines()

print(type(file_data_in_list))

file_obj.close() # buffer is erased

file_obj.read() #error

<class 'list'>


ValueError: I/O operation on closed file.

In [58]:
# Read the entire file and store its lines in a list
with open('sample-7766.txt', 'r') as file_obj:  # Open the file using 'with' to ensure automatic file closure
    file_data = file_obj.readlines()  

# Access the first line from the file data list
first_line = file_data[0]

print(first_line)


Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vivamus condimentum sagittis lacus, laoreet luctus ligula laoreet ut. Vestibulum ullamcorper accumsan velit vel vehicula. Proin tempor lacus arcu. Nunc at elit condimentum, semper nisi et, condimentum mi. In venenatis blandit nibh at sollicitudin. Vestibulum dapibus mauris at orci maximus pellentesque. Nullam id elementum ipsum. Suspendisse cursus lobortis viverra. Proin et erat at mauris tincidunt porttitor vitae ac dui.



* The with statement does not have its own scope. Instead, it creates a context for the block of code inside it, and any variables defined inside the with block will be in the same scope as the surrounding code. This means that variables defined inside the with block will be accessible both inside the with block and outside it.

# CSV Files

CSV (Comma-Separated Values) files are widely used for storing and managing tabular data, where data is organized in rows and columns. Each row represents one record, and each column represents an attribute of the data.

In Python, the `csv` module provides functionality for working with CSV files, making it easy to read and write data in this format.

To read a CSV file, the `csv.reader` object is commonly used, which allows you to read each row as a list of values. This makes it straightforward to access individual columns and process the data.

While it is possible to use `readlines()` to read a CSV file, this approach may not be ideal, as it returns each row as a single string, making it difficult to separate and work with individual columns effectively. Hence, the `csv.reader` method is preferred for handling CSV files in Python..

In [99]:
# we can read with inbuilt readlines

with open('year2017-7767.csv') as file_obj:
    file_data = file_obj.readlines() #list of strings

# we have to manually seperate the , seperated values for this
file_data[:5]

['Year,Month,Day,Country,Region,city,latitude,longitude,AttackType,Killed,Wounded,Target,Group,Target_type,Weapon_type,casualities\n',
 '2017,1,2,Afghanistan,South Asia,Takhta Pul,31.320556,65.961111,Hostage Taking (Kidnapping),0.0,0.0,Construction Workers,Taliban,Business,Firearms,0.0\n',
 '2017,1,3,Sudan,Sub-Saharan Africa,Fantaga,12.921007000000001,24.318324,Armed Assault,2.0,0.0,"Civilians: Haroun Yousif, Hamid Ibrahim",Unknown,Private Citizens & Property,Firearms,2.0\n',
 '2017,1,1,Democratic Republic of the Congo,Sub-Saharan Africa,Saboko,1.452372,29.875162,Armed Assault,7.0,0.0,Village,Allied Democratic Forces (ADF),Private Citizens & Property,Melee,7.0\n',
 '2017,1,1,Democratic Republic of the Congo,Sub-Saharan Africa,Bialee,1.4523700000000002,29.875186,Armed Assault,7.0,0.0,Village,Allied Democratic Forces (ADF),Private Citizens & Property,Melee,7.0\n']

In [60]:
# we use csv library
import csv

with open('year2017-7767.csv') as file_obj:
    file_data = csv.reader(file_obj)
print(file_data)
for row in file_data: #cant acess as fileclosed
    print(row)

# type(file_data) _csv.reader

<_csv.reader object at 0x0000018452B11CC0>


ValueError: I/O operation on closed file.

In [105]:
# we use csv library
import csv

with open('year2017-7767.csv') as file_obj:
    file_data = csv.reader(file_obj)
     for row in file_data:
        print(row)
   
# each row is comma seperated values in list
# type(file_data) _csv.reader


### CSV files with Custom Delimiters

By default, a comma is used as a delimiter in a CSV file. However, some CSV files can use delimiters other than a comma. Few popular ones are | and \t.

In [109]:
import csv

with open('sample_delim-7772.csv') as file_obj:
    file_data = csv.reader(file_obj)
    for row in file_data:
        print(row)

['Year|   Month|    Day|    Country|    Region|    City']
['2017|   1|    2|    Afghanistan|    Region|    Takhta Pul']
['2017|   1|    3|    Sudan|    Sub-Saharan Africa|    Fantaga']
['2017|   1|    1|    Democratic Republic of the Congo|    Region|    Sabako']
['2017|   1|    1|    Democratic Republic of the Congo|    Region|    Bialee']


In [110]:
import csv

with open('sample_delim-7772.csv') as file_obj:
    file_data = csv.reader(file_obj, delimiter='|')
    for row in file_data:
        print(row)

['Year', '   Month', '    Day', '    Country', '    Region', '    City']
['2017', '   1', '    2', '    Afghanistan', '    Region', '    Takhta Pul']
['2017', '   1', '    3', '    Sudan', '    Sub-Saharan Africa', '    Fantaga']
['2017', '   1', '    1', '    Democratic Republic of the Congo', '    Region', '    Sabako']
['2017', '   1', '    1', '    Democratic Republic of the Congo', '    Region', '    Bialee']


Now, we have separated each column successfully. But did you notice that there is some initial space before each column entry. Lets see how we can remove that.

CSV files with initial spaces
Some CSV files can have a space character after a delimiter. When we use the default csv.reader() function to read these CSV files, we will get spaces in the output as well.

To remove these initial spaces, we need to pass an additional parameter called skipinitialspace.

In [112]:
import csv

with open('sample_delim-7772.csv') as file_obj:
    file_data = csv.reader(file_obj, delimiter='|',skipinitialspace=True)
    for row in file_data:
        print(row)

['Year', 'Month', 'Day', 'Country', 'Region', 'City']
['2017', '1', '2', 'Afghanistan', 'Region', 'Takhta Pul']
['2017', '1', '3', 'Sudan', 'Sub-Saharan Africa', 'Fantaga']
['2017', '1', '1', 'Democratic Republic of the Congo', 'Region', 'Sabako']
['2017', '1', '1', 'Democratic Republic of the Congo', 'Region', 'Bialee']


In [113]:
import csv

with open('sample_delim-7772.csv') as file_obj:
    file_data = csv.reader(file_obj, delimiter='|',skipinitialspace=True)
    print(file_data)
    file_list = list(file_data)
    for row in file_data:
        print(row)

print(type(file_list))

<_csv.reader object at 0x000001B1B0455900>
<class 'list'>


In [116]:
import csv

with open('sample_delim-7772.csv') as file_obj:
    file_data = csv.reader(file_obj, delimiter='|',skipinitialspace=True)
    print(file_data)
    file_list = list(file_data)

print(type(file_list))
print(file_list)



<_csv.reader object at 0x000001B1B0455C60>
<class 'list'>
[['Year', 'Month', 'Day', 'Country', 'Region', 'City'], ['2017', '1', '2', 'Afghanistan', 'Region', 'Takhta Pul'], ['2017', '1', '3', 'Sudan', 'Sub-Saharan Africa', 'Fantaga'], ['2017', '1', '1', 'Democratic Republic of the Congo', 'Region', 'Sabako'], ['2017', '1', '1', 'Democratic Republic of the Congo', 'Region', 'Bialee']]


In [119]:
import csv

with open('year2017-7767.csv') as file_obj:
    file_data = csv.reader(file_obj, skipinitialspace=True)
    file_list = list(file_data)

# the values are strings, so we can use int or float
killed=[]
for row in file_list[1:]:
    killed.append(float(row[9]))
killed

ValueError: could not convert string to float: ''

In [61]:
# Above error is caused because there may be values which are missing "" i.e empty strings

In [63]:
import csv

with open('year2017-7767.csv') as file_obj:
    file_data = csv.reader(file_obj, skipinitialspace=True)
    file_list = list(file_data)

# the values are strings, so we can use int or float
killed=[]
for row in file_list[1:]:
    val= row[9]
    if val!="":
        killed.append(float(row[9]))
        
print("Total killed",int(sum(killed)))

Total killed 26445


In [5]:
# We can use DictReader instead , data will be convereted into key value pir, key will be column header 
import csv
# every row is conveted into dictionary
# by default it takes 0th row as key
with open('year2017-7767.csv') as file_obj:
    file_data = csv.DictReader(file_obj, skipinitialspace=True)
    print(type(file_data)) 
    for row in file_data[:2]:
        print(row)
# instead of indexes, now we can use keys for extracting

<class 'csv.DictReader'>


TypeError: 'DictReader' object is not subscriptable

In [8]:
import csv

with open('year2017-7767.csv') as file_obj:
    file_data = csv.DictReader(file_obj, skipinitialspace=True)
    file_data_list = list(file_data)
    
    for row in file_data_list[:2]:
        print(row)

{'Year': '2017', 'Month': '1', 'Day': '2', 'Country': 'Afghanistan', 'Region': 'South Asia', 'city': 'Takhta Pul', 'latitude': '31.320556', 'longitude': '65.961111', 'AttackType': 'Hostage Taking (Kidnapping)', 'Killed': '0.0', 'Wounded': '0.0', 'Target': 'Construction Workers', 'Group': 'Taliban', 'Target_type': 'Business', 'Weapon_type': 'Firearms', 'casualities': '0.0'}
{'Year': '2017', 'Month': '1', 'Day': '3', 'Country': 'Sudan', 'Region': 'Sub-Saharan Africa', 'city': 'Fantaga', 'latitude': '12.921007000000001', 'longitude': '24.318324', 'AttackType': 'Armed Assault', 'Killed': '2.0', 'Wounded': '0.0', 'Target': 'Civilians: Haroun Yousif, Hamid Ibrahim', 'Group': 'Unknown', 'Target_type': 'Private Citizens & Property', 'Weapon_type': 'Firearms', 'casualities': '2.0'}


In [12]:
import csv

with open('year2017-7767.csv') as file_obj:
    file_data = csv.DictReader(file_obj, skipinitialspace=True)
    file_data_list = list(file_data)
    for row in file_data_list[:10]:
        print(row['Weapon_type'])

Firearms
Firearms
Melee
Melee
Firearms
Explosives
Explosives
Explosives
Explosives
Firearms


In [17]:
# Total no of people killed from each country

import csv

with open('year2017-7767.csv') as file_obj:
    file_data = csv.DictReader(file_obj, skipinitialspace=True)

    data_dict = {}
    for row in file_data:
        str_killed = row['Killed']
        
        if str_killed != "":
            int_killed = int(float(str_killed))  # Convert to integer, handling floating-point values
        else:
            int_killed = 0

        data_dict[row['Country']] = data_dict.get(row['Country'], 0) + int_killed
    print(data_dict)

    # for key, value in data_dict.items():  # Use items() to access both keys and values
    #     print(key, value)

{'Afghanistan': 6092, 'Sudan': 82, 'Democratic Republic of the Congo': 596, 'Turkey': 222, 'Syria': 2026, 'Pakistan': 1076, 'Italy': 0, 'Somalia': 1912, 'Yemen': 762, 'Bahrain': 6, 'Myanmar': 218, 'Burundi': 20, 'Iraq': 6476, 'Egypt': 877, 'Burkina Faso': 53, 'India': 465, 'Algeria': 12, 'United States': 95, 'Philippines': 496, 'Greece': 0, 'Mali': 361, 'Libya': 289, 'Central African Republic': 601, 'Nigeria': 1805, 'Lebanon': 17, 'Mexico': 23, 'Cameroon': 228, 'Ethiopia': 67, 'Kyrgyzstan': 0, 'Serbia': 0, 'Sweden': 5, 'Thailand': 72, 'Iran': 39, 'France': 7, 'United Kingdom': 42, 'West Bank and Gaza Strip': 50, 'Ukraine': 40, 'Paraguay': 4, 'Colombia': 84, 'Malaysia': 4, 'Russia': 61, 'Kosovo': 0, 'South Africa': 21, 'Chile': 0, 'Kenya': 126, 'Israel': 3, 'Saudi Arabia': 31, 'China': 16, 'Nepal': 4, 'Ecuador': 0, 'Niger': 148, 'Venezuela': 5, 'South Sudan': 581, 'Canada': 6, 'Bangladesh': 25, 'Tajikistan': 1, 'Angola': 7, 'Ireland': 0, 'Peru': 8, 'Dominican Republic': 2, 'Poland': 0, 

In [18]:
# Total no of people killed from each country
import csv

with open('year2017-7767.csv') as file_obj:
    file_data = csv.DictReader(file_obj, skipinitialspace=True)
    file_list = list(file_data)
    
    country_killed = {}
    
    for row in file_list:
        key = row['Country']
        value = row['Killed']
        if value != "":
            value = int(float(value))  # Convert to integer, handling floating-point values
        else:
            value = 0
        country_killed[key] = country_killed.get(key, 0) + value
        
    print(country_killed)

{'Afghanistan': 6092, 'Sudan': 82, 'Democratic Republic of the Congo': 596, 'Turkey': 222, 'Syria': 2026, 'Pakistan': 1076, 'Italy': 0, 'Somalia': 1912, 'Yemen': 762, 'Bahrain': 6, 'Myanmar': 218, 'Burundi': 20, 'Iraq': 6476, 'Egypt': 877, 'Burkina Faso': 53, 'India': 465, 'Algeria': 12, 'United States': 95, 'Philippines': 496, 'Greece': 0, 'Mali': 361, 'Libya': 289, 'Central African Republic': 601, 'Nigeria': 1805, 'Lebanon': 17, 'Mexico': 23, 'Cameroon': 228, 'Ethiopia': 67, 'Kyrgyzstan': 0, 'Serbia': 0, 'Sweden': 5, 'Thailand': 72, 'Iran': 39, 'France': 7, 'United Kingdom': 42, 'West Bank and Gaza Strip': 50, 'Ukraine': 40, 'Paraguay': 4, 'Colombia': 84, 'Malaysia': 4, 'Russia': 61, 'Kosovo': 0, 'South Africa': 21, 'Chile': 0, 'Kenya': 126, 'Israel': 3, 'Saudi Arabia': 31, 'China': 16, 'Nepal': 4, 'Ecuador': 0, 'Niger': 148, 'Venezuela': 5, 'South Sudan': 581, 'Canada': 6, 'Bangladesh': 25, 'Tajikistan': 1, 'Angola': 7, 'Ireland': 0, 'Peru': 8, 'Dominican Republic': 2, 'Poland': 0, 

In [19]:
# Print 100 Bytes of file and print them

with open('sample-7766.txt', 'r') as file_obj:
    data = file_obj.read(100) #no of bytes to read =100
data

'Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vivamus condimentum sagittis lacus, laoreet'

In [20]:
# Print 5 Lines using readline

with open('sample-7766.txt', 'r') as file_obj:
    print(file_obj.readline())
    print(file_obj.readline())
    print(file_obj.readline())
    print(file_obj.readline())
    print(file_obj.readline())

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vivamus condimentum sagittis lacus, laoreet luctus ligula laoreet ut. Vestibulum ullamcorper accumsan velit vel vehicula. Proin tempor lacus arcu. Nunc at elit condimentum, semper nisi et, condimentum mi. In venenatis blandit nibh at sollicitudin. Vestibulum dapibus mauris at orci maximus pellentesque. Nullam id elementum ipsum. Suspendisse cursus lobortis viverra. Proin et erat at mauris tincidunt porttitor vitae ac dui.

Donec vulputate lorem tortor, nec fermentum nibh bibendum vel. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Praesent dictum luctus massa, non euismod lacus. Pellentesque condimentum dolor est, ut dapibus lectus luctus ac. Ut sagittis commodo arcu. Integer nisi nulla, facilisis sit amet nulla quis, eleifend suscipit purus. Class aptent taciti sociosqu ad litora torquent per conubia nostra, per inceptos himenaeos. Aliquam euismod ultrices lorem, sit amet imperdiet est tincidunt vel. Phasellus dictum j

In [21]:
# Print 3 Lines using readlines

with open('sample-7766.txt', 'r') as file_obj:
    data= file_obj.readlines() #produces list of lines of strings
    for i in data[:3]:
        print(i)

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vivamus condimentum sagittis lacus, laoreet luctus ligula laoreet ut. Vestibulum ullamcorper accumsan velit vel vehicula. Proin tempor lacus arcu. Nunc at elit condimentum, semper nisi et, condimentum mi. In venenatis blandit nibh at sollicitudin. Vestibulum dapibus mauris at orci maximus pellentesque. Nullam id elementum ipsum. Suspendisse cursus lobortis viverra. Proin et erat at mauris tincidunt porttitor vitae ac dui.

Donec vulputate lorem tortor, nec fermentum nibh bibendum vel. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Praesent dictum luctus massa, non euismod lacus. Pellentesque condimentum dolor est, ut dapibus lectus luctus ac. Ut sagittis commodo arcu. Integer nisi nulla, facilisis sit amet nulla quis, eleifend suscipit purus. Class aptent taciti sociosqu ad litora torquent per conubia nostra, per inceptos himenaeos. Aliquam euismod ultrices lorem, sit amet imperdiet est tincidunt vel. Phasellus dictum j

In [24]:
# First 3 lines using DictReader
import csv

with open('year2017-7767.csv') as file_obj:
    data = csv.DictReader(file_obj,skipinitialspace=True)
    data_list =list(data)
    
    for i in data_list[:3]:
        print(i)
        print()

{'Year': '2017', 'Month': '1', 'Day': '2', 'Country': 'Afghanistan', 'Region': 'South Asia', 'city': 'Takhta Pul', 'latitude': '31.320556', 'longitude': '65.961111', 'AttackType': 'Hostage Taking (Kidnapping)', 'Killed': '0.0', 'Wounded': '0.0', 'Target': 'Construction Workers', 'Group': 'Taliban', 'Target_type': 'Business', 'Weapon_type': 'Firearms', 'casualities': '0.0'}

{'Year': '2017', 'Month': '1', 'Day': '3', 'Country': 'Sudan', 'Region': 'Sub-Saharan Africa', 'city': 'Fantaga', 'latitude': '12.921007000000001', 'longitude': '24.318324', 'AttackType': 'Armed Assault', 'Killed': '2.0', 'Wounded': '0.0', 'Target': 'Civilians: Haroun Yousif, Hamid Ibrahim', 'Group': 'Unknown', 'Target_type': 'Private Citizens & Property', 'Weapon_type': 'Firearms', 'casualities': '2.0'}

{'Year': '2017', 'Month': '1', 'Day': '1', 'Country': 'Democratic Republic of the Congo', 'Region': 'Sub-Saharan Africa', 'city': 'Saboko', 'latitude': '1.452372', 'longitude': '29.875162', 'AttackType': 'Armed Ass

In [26]:
# First 3 lines using reader
import csv

with open('year2017-7767.csv') as file_obj:
    data = csv.reader(file_obj,skipinitialspace=True) 
    data_list = list(data) #list of rows
    j=0
    for i in data_list[1:]:
        if j>3:
            break
        print(i)
        print()
        j+=1

['2017', '1', '2', 'Afghanistan', 'South Asia', 'Takhta Pul', '31.320556', '65.961111', 'Hostage Taking (Kidnapping)', '0.0', '0.0', 'Construction Workers', 'Taliban', 'Business', 'Firearms', '0.0']

['2017', '1', '3', 'Sudan', 'Sub-Saharan Africa', 'Fantaga', '12.921007000000001', '24.318324', 'Armed Assault', '2.0', '0.0', 'Civilians: Haroun Yousif, Hamid Ibrahim', 'Unknown', 'Private Citizens & Property', 'Firearms', '2.0']

['2017', '1', '1', 'Democratic Republic of the Congo', 'Sub-Saharan Africa', 'Saboko', '1.452372', '29.875162', 'Armed Assault', '7.0', '0.0', 'Village', 'Allied Democratic Forces (ADF)', 'Private Citizens & Property', 'Melee', '7.0']

['2017', '1', '1', 'Democratic Republic of the Congo', 'Sub-Saharan Africa', 'Bialee', '1.4523700000000002', '29.875186', 'Armed Assault', '7.0', '0.0', 'Village', 'Allied Democratic Forces (ADF)', 'Private Citizens & Property', 'Melee', '7.0']



In [27]:
# Print all Column names
import csv

with open('year2017-7767.csv') as file_obj:
    data = csv.reader(file_obj,skipinitialspace=True)
    data_list = list(data) 
    for i in data_list[0]: #first is colum names
        print(i)

Year
Month
Day
Country
Region
city
latitude
longitude
AttackType
Killed
Wounded
Target
Group
Target_type
Weapon_type
casualities


In [28]:
# First 10 country names
import csv

with open('year2017-7767.csv') as file_obj:
    data = csv.DictReader(file_obj,skipinitialspace=True)
    data_list = list(data)
    j=0
    for i in data_list:
        if j>10:
            break
        print(i['Country'])
        j+=1

Afghanistan
Sudan
Democratic Republic of the Congo
Democratic Republic of the Congo
Turkey
Syria
Pakistan
Italy
Turkey
Turkey
Somalia


In [31]:
# Total Wounded People
import csv

with open('year2017-7767.csv','r') as file_obj:
    data = csv.DictReader(file_obj, skipinitialspace=True)
    data_list = list(data)
    s=0
    for i in data_list:
        val= i['Wounded']
        if val!="":
            val= int(float(val))
        else:
            val=0
        s+=val
print(s)

24927


In [1]:
# Total Wounded From India
import csv

with open('year2017-7767.csv', 'r') as file_obj:
    data = csv.DictReader(file_obj, skipinitialspace=True)

    wounded_india = []
    for row in data:
        country = row['Country']
        val = row['Wounded']

        if val != "" and country == "India":
            wounded_india.append(int(float(val)))

total_wounded_india = sum(wounded_india)
print(total_wounded_india)

702


In [2]:
# Casulties from Explosives

import csv

with open('year2017-7767.csv', 'r') as file_obj:
    data = csv.DictReader(file_obj, skipinitialspace=True)
    casualities =[]
    
    for row in data:
        val = row['Weapon_type']
        temp = row['casualities']
        if(val == 'Explosives' and temp!=''):
            casualities.append(int(float(temp)))

print(sum(casualities))

29280


In [3]:
# Month vs Killed
# Total no of people killed in each month
# Print the month and count of killed people as integer value
import csv

with open('year2017-7767.csv', 'r') as file_obj:
    data = csv.DictReader(file_obj, skipinitialspace=True)
    dict ={}
    for row in data:
        month= row['Month']
        killed= row['Killed'] 
        if killed!='':
            killed=int(float(killed))
        else:
            killed=0
        dict[month]=dict.get(month,0) + killed
for key,value in dict.items():
    print(key,value)

1 2275
2 2027
3 2463
4 2142
5 2936
6 2506
7 2228
8 2145
9 1764
10 2580
11 2014
12 1365


In [5]:
# Country vs casualities
import csv

with open('year2017-7767.csv', 'r') as file_obj:
    data = csv.DictReader(file_obj, skipinitialspace=True)
    dict ={}
    for row in data:
        country= row['Country']
        casualities= row['casualities'] 
        if casualities!='':
            casualities=int(float(casualities))
        else:
            casualities=0
        dict[country]=dict.get(country,0) + casualities

In [6]:
# Amazon jobs Dataset from 2011 to 2018

# Find no of job openings in Banglore,IN and in Seattle,US
import csv

# Specify the correct encoding when opening the CSV file
with open('amazon_jobs_dataset.csv', 'r', encoding='utf-8') as file_obj:
    data = csv.DictReader(file_obj, skipinitialspace=True)

    india_openings = 0
    usa_openings = 0

    for row in data:
        location = row['location'].lower()
        if "bangalore" in location:
            india_openings += 1
        if "seattle" in location:
            usa_openings += 1

print("Number of job openings in Bangalore, IN:", india_openings)
print("Number of job openings in Seattle, US:", usa_openings)


Number of job openings in Bangalore, IN: 66
Number of job openings in Seattle, US: 1856


In [7]:
# Job Computer vision

import csv

with open('amazon_jobs_dataset.csv', 'r', encoding='utf-8') as file_obj:
    data = csv.DictReader(file_obj, skipinitialspace=True)

    computer_vision_openings = 0

    for row in data:
        job_title = row['Title'].lower()
        if "computer vision" in job_title:
            computer_vision_openings += 1

print("Number of job openings in Computer Vision:", computer_vision_openings)

Number of job openings in Computer Vision: 14


In [8]:
# Job openings Canada

import csv

with open('amazon_jobs_dataset.csv', 'r', encoding='utf-8') as file_obj:
    data = csv.DictReader(file_obj, skipinitialspace=True)

    canada_openings = 0

    for row in data:
        location =("CA" in row['location'].split(','))
        if location:
            canada_openings += 1

print("Number of job openings in Canada:", canada_openings)

Number of job openings in Canada: 156


In [10]:
# Job Month 2018
# Find the month having most job openings in year 2018
# Job openings Canada

import csv

with open('amazon_jobs_dataset.csv', 'r', encoding='utf-8') as file_obj:
    data = csv.DictReader(file_obj, skipinitialspace=True)

    month_openings={}
    
    for row in data:
        date =row['Posting_date']
        if '2018' in date:
            posting_date =date.split(',')
            month = posting_date[0].split()[0]
            month_openings[month] = month_openings.get(month,0)+1

month = max(month_openings, key=month_openings.get)

print("Month having most job openings in Year 2018 ",month," ",month_openings[month] )

Month having most job openings in Year 2018  January   907


In [11]:
# Job Degree
# Bachelor degree as basic qualification

import csv

with open('amazon_jobs_dataset.csv', 'r', encoding='utf-8') as file_obj:
    data = csv.DictReader(file_obj, skipinitialspace=True)

    total_positions=0
    
    for row in data:
        basic_qualification = row['BASIC QUALIFICATIONS'].lower()
        if ("bachelor" in basic_qualification) or ("bs" in basic_qualification ) or "ba" in basic_qualification:
            total_positions+=1

print("Total Jobs for Bachelors Degree is:  ",total_positions )

Total Jobs for Bachelors Degree is:   3217


In [13]:
# Language Jobs
import csv

with open('amazon_jobs_dataset.csv','r',encoding='utf-8') as file_obj:
    data = csv.DictReader(file_obj,skipinitialspace=True)

    lang_openings={}

    for row in data:
        basic_qualification = row['BASIC QUALIFICATIONS']
        location = row['location']
        isIndia=False
        if ('India' in location) or "IN" in location:
            isIndia =True
        if isIndia and (("Bachelor" in basic_qualification) or ("BS" in basic_qualification ) or ("BA" in basic_qualification)):
            if 'Java' in basic_qualification:
                lang_openings['Java']=lang_openings.get('Java',0)+1
            if 'C++' in basic_qualification:
                lang_openings['C++']=lang_openings.get('C++',0)+1
            if 'Python' in basic_qualification:
                lang_openings['Python']=lang_openings.get('Python',0)+1

for key,value in lang_openings.items():
    print(key,value)

# print(max(lang_openings,key=lang_openings.get))

Java 126
C++ 86
Python 37


In [14]:
# Language Jobs

import csv

with open('amazon_jobs_dataset.csv','r',encoding='utf-8') as file_obj:
    data = csv.DictReader(file_obj,skipinitialspace=True)

    amazon_country_openings={}
    for row in data:
        basic_qualification = row['BASIC QUALIFICATIONS']
        location = row['location']
        if 'Java' in basic_qualification:
            amazon_country_openings[location]=amazon_country_openings.get(location,0)+1


loc=max(amazon_country_openings,key=amazon_country_openings.get)
print(loc,amazon_country_openings[loc])

US, WA, Seattle  1310
