# Accessing InternetDB and Combining with CIC Data

## Overview: Connect to DB, Scrape, Initial Observations, Cleaning dataset, and Combining


---
## Project Workflow:
1. **Step 1: Install and Import Necessary Libraries**
2. **Step 2: Load CIC Dataset**
3. **Step 3: InternetDB Scraping Function**
    - create function
    - test function, see relevant features
4. **Step 4: Create New IP Vulnurability DF**
    - columns from previous relevant features
    - cycle through CIC unique IPs and populate columns with IDB Data



# Step 1: Install and Import Necessary Libraries


In [1]:
import requests
import pandas as pd

# Step 2: Load CIC Dataset

In [2]:

file_path = r"data/CIC_Camera_data.csv"

CIC_data = pd.read_csv(file_path)

CIC_data.sample(10)

Unnamed: 0,stream,user_agent,src_mac,dst_mac,dst_ip,dst_port,time_since_previously_displayed_frame,inter_arrival_time,eth_size,payload_length,...,stream_10_count,stream_10_mean,stream_10_var,src_ip_10_count,src_ip_10_mean,src_ip_10_var,channel_10_count,channel_10_mean,channel_10_var,traffic_type
20494,-1,none,Wyze Camera,3c:18:a0:41:c3:a0,192.168.137.1,0,0.016649,254631.171214,98,40,...,44.0,78.863636,4514.679,13.0,77.076923,314.2436,3.0,98.0,0.0,BF
69431,28,none,HeimVision Smart WiFi Camera,3c:18:a0:41:c3:a0,47.88.56.147,50920,0.010015,1956.250382,93,39,...,7.0,100.857143,2731.143,6.0,150.0,5694.0,6.0,82.0,290.4,XSS
54918,1,none,Nest Indoor Camera,3c:18:a0:41:c3:a0,35.185.101.66,443,0.034912,4952.997684,66,0,...,15.0,121.733333,3272.495,7.0,143.142857,1157.143,13.0,107.538462,2180.769,BN
51226,27,none,Amazon Echo Show,Amazon Echo Dot 2,192.168.137.58,55444,0.010229,4247.637773,60,16,...,36.0,61.111111,16.27302,29.0,126.0,28145.71,38.0,87.526316,14908.26,BN
45118,3,none,Amazon Echo Show,3c:18:a0:41:c3:a0,34.158.253.218,4070,0.023718,2950.758518,66,0,...,4.0,150.75,8394.25,45.0,102.0,22018.0,2.0,71.5,60.5,BN
15334,263,none,Amazon Echo Show,Amazon Echo Dot 2,192.168.137.210,55443,0.002354,250219.564804,66,0,...,4.0,319.0,103236.7,5.0,355.6,157252.8,7.0,314.0,109899.0,BF
68510,313,none,Netatmo Camera,3c:18:a0:41:c3:a0,51.145.143.28,443,0.000474,1624.954878,1494,1428,...,101.0,1048.019802,838200.8,171.0,1482.415205,483772.6,293.0,933.467577,755573.1,XSS
76454,440,none,Netatmo Camera,3c:18:a0:41:c3:a0,51.145.143.28,443,0.001697,1980.988848,2922,2856,...,117.0,1011.367521,2664849.0,64.0,1705.453125,3713449.0,117.0,1011.367521,2664849.0,BM
14454,2067,none,Wyze Camera,3c:18:a0:41:c3:a0,100.21.21.34,443,0.002633,249940.912076,1514,1448,...,17.0,506.411765,858181.3,34.0,165.617647,75451.94,17.0,506.411765,858181.3,BF
17099,2400,none,Netatmo Camera,3c:18:a0:41:c3:a0,51.145.143.28,443,0.004369,250320.938675,192,126,...,12.0,566.083333,824207.0,120.0,1274.658333,252950.3,186.0,909.564516,526193.9,BF


# Step 3: InternetDB Scraping Function

In [3]:
# AI-------------
def fetch_internetdb_data(ip):
    url = f"https://internetdb.shodan.io/{ip}"
    response = requests.get(url)
    if response.status_code == 200:
        return response.json()
    else:
        return None

In [4]:
# test function
result = fetch_internetdb_data("35.186.43.132")
print(result)

{'cpes': [], 'hostnames': ['132.43.186.35.bc.googleusercontent.com', 'nexus-us1.dropcam.com', 'oculus7759-us1.dropcam.com', 'nexus.dropcam.com'], 'ip': '35.186.43.132', 'ports': [80, 443, 1443], 'tags': ['cloud'], 'vulns': []}


In [5]:
result = fetch_internetdb_data("47.88.56.147")
print(result)

{'cpes': ['cpe:/a:f5:nginx', 'cpe:/a:wordpress:wordpress', 'cpe:/a:openbsd:openssh:8.0', 'cpe:/a:f5:nginx:1.26.1', 'cpe:/a:php:php', 'cpe:/a:jquery:jquery:1.8.3', 'cpe:/a:mysql:mysql'], 'hostnames': ['pacmc.net.cn', 'www.pacmc.net.cn'], 'ip': '47.88.56.147', 'ports': [22, 80, 443], 'tags': ['cloud'], 'vulns': ['CVE-2012-6708', 'CVE-2020-14145', 'CVE-2020-15778', 'CVE-2023-48795', 'CVE-2016-20012', 'CVE-2019-16905', 'CVE-2007-2768', 'CVE-2023-51767', 'CVE-2008-3844', 'CVE-2020-7656', 'CVE-2021-36368', 'CVE-2015-9251', 'CVE-2019-11358', 'CVE-2020-11023', 'CVE-2023-51385', 'CVE-2023-38408', 'CVE-2020-11022', 'CVE-2021-41617']}


## Results
- relevant features: open ports, tags, vulnerabilities

# Step 4: Create New IP Vulnurability DF

In [6]:
IP_data = pd.DataFrame(columns=['IP', 'open_ports', 'tags', 'vulns', 'camera_model'])

IP_data

Unnamed: 0,IP,open_ports,tags,vulns,camera_model


In [7]:
unique_ips = CIC_data['dst_ip'].unique().tolist()

In [None]:
# AI----------------------------
for ip in unique_ips:
    data = fetch_internetdb_data(ip)
    if data:
        camera_model = CIC_data[CIC_data['dst_ip'] == ip]['src_mac'].values[0]
        new_row = pd.DataFrame([{
            'IP': ip,
            'open_ports': data.get('ports', []),
            'tags': data.get('tags', []),
            'vulns': data.get('vulns', []),
            'camera_model': camera_model
        }])
        IP_data = pd.concat([IP_data, new_row], ignore_index=True)


In [None]:
IP_data.sample(10)

Unnamed: 0,IP,open_ports,tags,vulns,camera_model
15,52.12.78.38,"[80, 443]",[cloud],[],Wyze Camera
227,44.236.177.9,"[80, 443]",[cloud],[],Wyze Camera
63,35.81.104.182,"[80, 443]",[cloud],[],Wyze Camera
212,130.211.135.74,"[443, 1443]",[cloud],[],Nest Indoor Camera
177,108.138.128.12,"[80, 443]","[cdn, cloud]",[],Amazon Echo Show
44,52.31.148.72,"[80, 443]",[cloud],[],Arlo Q Indoor Camera
239,54.191.153.236,"[80, 443]",[cloud],[],Wyze Camera
9,54.38.179.187,[443],[],[],Netatmo Camera
125,47.254.89.110,"[80, 443, 8080, 32100]",[cloud],"[CVE-2021-26690, CVE-2018-1302, CVE-2018-1303,...",Yi Indoor Camera
215,34.197.253.215,"[80, 443]",[cloud],[],Amazon Echo Show


In [None]:
IP_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 264 entries, 0 to 263
Data columns (total 5 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   IP            264 non-null    object
 1   open_ports    264 non-null    object
 2   tags          264 non-null    object
 3   vulns         264 non-null    object
 4   camera_model  264 non-null    object
dtypes: object(5)
memory usage: 10.4+ KB


In [None]:
IP_data.to_csv('data/IP_data.csv', index=False)