# CS 39006: Networks Lab
# Assignment 2: Packet Sniffer and Packet Analyzer -- Exploring Further

## Name : Radhika Patwari
## Roll no. : 18CS10062

## Installing 3 libraries required for various operations


1.   xmltodict : It parses xml file (.pdml extension) into dictionary for carrying various operations [Info :  https://pypi.org/project/xmltodict/]
2.   ip2geotools : It takes an ip address and returns a 2-character coutry code
3.   pycountry : It takes up the country code and returns the corresponding country name 

In [None]:
!pip install xmltodict
!pip install ip2geotools
!pip install pycountry

## Importing the required libraries

In [2]:
import xmltodict                                            # xml to dict library
import json                                                 # printing and checking in json format
from ip2geotools.databases.noncommercial import DbIpCity    # ip to country code
import pycountry                                            # country code to country name conversion

## Reading up the Xml file and storing the dictionary 

### Upload the .pdml file containing the HTTP Requests to google collab and store the path of the file in `xml_file` accordingly

In [3]:
xml_file = '/content/assign2_http_get_requests.pdml'
with open(xml_file, 'r', encoding='utf-8') as fd:
    xml_data = xmltodict.parse(fd.read())

## Performing minor tests to ensure proper storage of parsed xml file

In [None]:
print(len(xml_data['pdml']['packet']))
print(type(xml_data['pdml']['packet']))
print(json.dumps(xml_data['pdml']['packet'][0]['proto'][5],indent=4))
print(type(xml_data['pdml']['packet'][0]['proto'][0]))

## Extracting IP Addresses of clients that accessed LearnBasics service through the FreeBasics HTTP proxy (Internet.org) of Facebook

### There are basically 3 types of http requests :


1.   Users that accessed the LearnBasics service via web and not through any proxy client
2.   Users that accessed the LearnBasics service via FreeBasics HTTP Proxy (Internet.org)
3. Users that accessed the LearnBasics service via a third party proxy server (other than FreeBasics proxy server)

For a HTTP request , the field `x_forwarded_for` contains the ip address of original user when the request passes through a proxy client. This separates the users that have accessed the LearnBasics service without using any proxy server.
Another field `Via: Internet.org` indicates that the request has come through Internet.org proxy server provided by Facebook.
Thus all the requests having these two fields are the requests coming through FreeBasics proxy server.

### Below code extracts these ip addresses by checking these 2 fields for every http packet :

[ Total 2534 HTTP GET requests have been made through FreeBasics HTTP Proxy ]

In [5]:
ctr_used_proxy = 0              # counts requests coming through some proxy server
ctr_used_freebasic_proxy = 0    # counts requests coming through FreeBasics Proxy server (Internet.org)
ip_addresses = []               # stores the final list of required ip addresses

for packet in xml_data['pdml']['packet']:
  used_proxy = False
  used_freebasic_proxy = False
  for proto in packet['proto']:
    if(proto['@name'] == 'http'):
      field = proto['field']
      for http in field:
        if(http['@name'] == 'http.x_forwarded_for'):
          used_proxy = True
          ctr_used_proxy = ctr_used_proxy + 1
          ip = http['@show']
        if(http['@show'] == 'Via: Internet.org '):
          used_freebasic_proxy = True
          ctr_used_freebasic_proxy = ctr_used_freebasic_proxy + 1
  if(used_proxy and used_freebasic_proxy):
    ip_addresses.append(ip)

print(ctr_used_proxy)
print(ctr_used_freebasic_proxy)
print(len(ip_addresses))

2539
2534
2534


## Removing duplicate elements from the list of ip addresses

[ Total 481 distinct IP addresses are present ] 

In [6]:
ip_addresses = set(ip_addresses)
print(len(ip_addresses))

481


## IP address to country name conversion 

### We iterate through the `ip_addresses` list and find the country code and corresponding country name for the respective ip address and store in `country_ip`

In [None]:
country_ip = {}
ctr = 0
for ip in ip_addresses:
  ctr = ctr+1
  country_code = DbIpCity.get(ip, api_key='free')
  country = pycountry.countries.get(alpha_2=country_code.country)
  if country.name in country_ip:
    country_ip[country.name].append(ip)
  else:
    country_ip[country.name] = [ip]
  print('num : ',ctr,' | ip : ',ip,' | country code : ',country_code.country,' | country : ',country.name)

## Sorting the dictionary alphabetically if required

In [None]:
ans = {}
for i in sorted(country_ip):
  ans[i] = len(country_ip[i])
country_ip = ans
print(json.dumps(country_ip, indent=4))

## Converting data into a csv file with headers 'Country' and 'Number of Users'

Here `Number of Users` stores the number of users accessing the LearnBasics service via FreeBasic HTTP Proxy from that corresponding `Country` value

In [14]:
import csv
csv_columns = ['Country','Number of Users']
csv_file = "freebasic_users.csv"

try:
    with open(csv_file, 'w') as csvfile:
        writer = csv.DictWriter(csvfile, fieldnames=csv_columns)
        writer.writeheader()
        for country,users in country_ip.items():
          writer.writerow({'Country':country, 'Number of Users':users})
except IOError:
    print("I/O error")

## Reading content of generated csv file using pandas 

[ Total Number of Countries that have accessed LearnBasics app via FreeBasics server  is 29 ]

In [15]:
import pandas 
csvFile = pandas.read_csv(csv_file) 
print(csvFile) 

                                  Country  Number of Users
0                                  Angola                2
1                              Bangladesh               13
2                                   Benin                3
3                                Cambodia                1
4                                Colombia                2
5   Congo, The Democratic Republic of the               14
6                                   Ghana                6
7                                  Guinea                1
8                                   India                1
9                               Indonesia               27
10                                   Iraq               57
11                                  Kenya                3
12                                 Malawi                6
13                               Maldives                1
14                                 Mexico               18
15                               Mongolia               

## Downloading the generated csv file on local machine if required

In [16]:
from google.colab import files
files.download("freebasic_users.csv")

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>