This notebook demonstrates using urllib to scrape public supplier and contract information and format/save the output using Pandas and JSONLines.


In [1]:
import pandas as pd
import urllib.request
import jsonlines

Scrape list of suppliers and save HTML to local file

In [11]:
ict_panel_url = 'https://www.tenders.gov.au/?event=public.SON.view&SONUUID=7A6BD483-91A0-5927-C7044D04D7E413EB'

f = urllib.request.urlretrieve(ict_panel_url, "ict_panel_suppliers.html")


Read HTML from locally stored file into DataFrame

In [12]:
t = pd.read_html("ict_panel_suppliers.html", attrs={'class': 'genT'}, header=0)
suppliers_df = t[0]
suppliers_df.index = suppliers_df.ABN

print('Suppliers: ' + str(suppliers_df.shape[0]))

Suppliers: 204


In [13]:
suppliers_df.head()

Unnamed: 0_level_0,Supplier Name,ABN,State,Postcode
ABN,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
51 137 899 346,4DATA Hall 1 Pty Ltd (trading as 4DATA),51 137 899 346,ACT,2609
95 127 484 904,A&A Testing Consultants Pty Ltd,95 127 484 904,NSW,2620
13 069 942 552,Access Testing Pty Ltd trading as AccessHQ,13 069 942 552,ACT,2602
52 150 446 521,AccessibilityOz Pty Ltd,52 150 446 521,VIC,3000
18 120 507 137,Accessity Pty Ltd,18 120 507 137,ACT,2617


Dump to JSONLines format

In [14]:
f = open("ict_panel_suppliers.jsonl", "w")
f.write(suppliers_df.to_json(orient='records').replace('},{', '}\n{').replace('[','').replace(']',''))
f.close()

Confirm the file can be read

In [15]:
f2 = jsonlines.open("ict_panel_suppliers.jsonl", "r")
for r in f2.iter():
    print(r)
f2.close()


{'Postcode': 2609, 'Supplier Name': '4DATA Hall 1 Pty Ltd (trading as 4DATA)', 'State': 'ACT', 'ABN': '51 137 899 346'}
{'Postcode': 2620, 'Supplier Name': 'A&A Testing Consultants Pty Ltd', 'State': 'NSW', 'ABN': '95 127 484 904'}
{'Postcode': 2602, 'Supplier Name': 'Access Testing Pty Ltd trading as AccessHQ', 'State': 'ACT', 'ABN': '13 069 942 552'}
{'Postcode': 3000, 'Supplier Name': 'AccessibilityOz Pty Ltd', 'State': 'VIC', 'ABN': '52 150 446 521'}
{'Postcode': 2617, 'Supplier Name': 'Accessity Pty Ltd', 'State': 'ACT', 'ABN': '18 120 507 137'}
{'Postcode': 2600, 'Supplier Name': 'ACSPRO Pty Ltd', 'State': 'ACT', 'ABN': '49 800 667 911'}
{'Postcode': 3004, 'Supplier Name': 'Adaps IT Pty Ltd', 'State': 'VIC', 'ABN': '50 169 520 478'}
{'Postcode': 2612, 'Supplier Name': 'Adelphi Digital Consulting Group Pty Ltd', 'State': 'ACT', 'ABN': '43 096 505 805'}
{'Postcode': 2602, 'Supplier Name': 'AGIS Group Pty Ltd', 'State': 'ACT', 'ABN': '34 129 384 032'}
{'Postcode': 2600, 'Supplier Na