Skip to content

99 Neighborhood Council Websites Technologies Used Analysis

Willa Mannering edited this page May 26, 2022 · 11 revisions

Overview

Project to create a scraper to get information from builtwith.com on technologies used by 99 neighborhood council website.

Optional

Automate scrape job to run periodically.

Requirements

  • Be able to run script on demand
  • Gather the following information
    • Name of Tech on each NC site (their entire site, not just the homepage)
    • URL of Tech
    • Category of Tech

Nice to have (if not provided, we can do it ourselves with data above)

  • Total NCs using Tech
  • Total Catagories

History

  • 2021-07-01 New issue created https://github.com/hackforla/data-science/issues/44
  • 2021-07-08 Accessing the API didn’t return the info required for the project, so we will use selenium to scrape
  • 2021-08-09 Sophia ran a video tutorial session on scraping with Selenium and shared some starter code. Next steps is for the person assigned to issue to parse the output into a usable format and save it as a file
  • 2021-08-30 Abe joined as the Cop PM and said he would get up to speed and then move the issue forward.
  • 2021-09-13 Rajinder assigned
  • Rajinder creates script for webscraping for all websites. It includes a dockerfile and produces a json file as its output. He adds code to DS repository
  • Sofia helps Rajinder sort out the API rate limitation problem by having the script only hit the API once every 30 seconds.
  • Rajinder updates his person version of the script
  • Currently waiting for him to update DS repository with changes (in the meantime Ryan has saved the code from Rajinders repo, just in case).
  • Willa updated Rajinder's script to also include tech URL and tech category

Artifacts

OCS: Builtwith data on 99 NCs technologies

Updated spreadsheet, OCS: Builtwith tech_table

Resources/Instructions

External Tools

Tutorial

Project input (data)

Project output

Rajinder's code

Related issues

Open Community Survey
Data Science Community of Practice

Past Collaborators

@akibrhast, @ava li, @Sarah Williams, @wendywilhelm10 @rajindermavi @ShikaZzz @JessicaFB @Poorvi Rao

Current Collaborators

@kalyaniraman, @akhaleghi, @ryanswan @salice

Clone this wiki locally