Skip to content

Instagram scraping algorithm for collecting json and images to identify wildlife trade of Slow Loris

License

Notifications You must be signed in to change notification settings

prak112/data4wildlife

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Data 4 Wildlife Hackathon 🛠️

Hackathon (29-30 Jan 2022) based on developing a digital solution to prevent illegal wildlife trade (IWT) on online social platforms.

  • Team - Sean P. Rogers, Gabriela Youngken 👩‍🎓 👨‍🎓
  • Mentor - Alastair Jamieson 👨‍🏫 (also API-keys holder 👛)

Challenge

  • To build a benchmark dataset of possible instances of IWT & related information from online social platforms which could also be searched and analyzed 🔚
  • According to challenge guidelines : Challenge1_Guidelines
    • A benchmark dataset is a public dataset which is designed and collected for studying real-world data science/research problems.
    • The benchmark dataset should be social media platform agnostic, as IWT happens across multiple platforms such as Instagram and YouTube.

Our Task

  • Collect instagram posts with images related to Slow Loris hashtags (slowloris, slowlorisforsale) to build a benchmark dataset 🏛️

  • Task Duration - 26 hours 🏃⏲️

Our Approach 🏗️

  • Manually identify Slow Loris hashtags 🐵 for example data
  • Call instagram api (RapidAPI, instagram85) for hashtag related feed
  • Collect json (first page only), extract images & label images by user id
  • Save images in folder labelled by language (see Future Prospects)
  • Iterate api calls & collect images
  • Import json to webpage, index.html, for human validation of images
  • Manually validate images and export csv file with information from comments

Future Prospects 👀

  • Call api recursively with 'next_page_id' to collect all pages
  • Depending on image volume, project can evolve into Image Recognition for automation

Key Takeaways

  • Focus on the bigger picture 🌄
  • Build one-block-at-a-time 🧱
  • Have consistent breaks 😌

About

Instagram scraping algorithm for collecting json and images to identify wildlife trade of Slow Loris

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published