Guide and requirements for analysing posts on Instragram
- Install required software
- Scraping
- Data manipulation in R and export to Microsoft Excel
- Python
- Instagram Scraper
- R
- RStudio (optional)
- Download
- Open Command Prompt (on some corporate networks you need to run the command prompt as administrator)
setx PATH "%PATH%;C:\Python27\Scripts"
-
Install Xcode tools
- Open Terminal and copy/paste the following code:
xcode-select --install
- Open Terminal and copy/paste the following code:
-
Install Homebrew
-
Paste following code into the terminal window:
/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)" -
Add Homebrew to your path. Paste following code into the terminal windows:
export PATH="/usr/local/bin:/usr/local/sbin:$PATH"
-
-
Install Python (2.7) via terminal:
brew install python@2- Add Python to your path:
export PATH="/usr/local/opt/python@2/libexec/bin:$PATH"
- Add Python to your path:
- Open command prompt (windows) or terminal (mac)
- Paste the following command:
pip install instagram-scraper
-
Install packages for R:
- jsonlite
- stringr
- tidyr
- dplyr
- openxlsx
- plyr
- repurrrsive
- purrr
- webshot
e.g.:
install.packages("jsonlite")- Install PhantomJS for R:
webshot::install_phantomjs()
To scrape a user's account:
instagram-scraper username -u yourusername -p yourpassword –-media-metadata –-comments –d path
To scrape a hashtag:
instagram-scraper hashtag --tag
The program will produce images for each instagram post and a json file with all metadata (tag, post, likes, comments, etc.).
Check full documentation here
This operation outputs a readable Microsoft Excel file.
- Download this repos and save to a location where the scraper downloaded images and json file
- Open RStudio
- Create new project and choose the same save location
- Open file: instaRanalysis.Rmd
- Change the variable
inputto json filename (without the .json extension)- E.g.
input <- "somejsonfile"
- E.g.
- Change the variable
fileLocto the the full path of where the images from the scrape are saved:- E.g.
fileLoc <- "C:\myfiles\" - Works also with online drives such as Microsoft Onedrive or Sharepoint:
- E.g.
fileLoc <- https://corpname.sharepoint.com/Sites/sitename/Shared%20Documents/images/
- E.g.
- E.g.
- Run both "chunks" (wait for one to finish before you start the second)
- the command:
webshot(c(df$url),delay = 3, file="InstaShot.png")
downloads screenshots of instagram post. This process might take a very long time for especially big datasets. This command may also exit with an error saying it cannot open a specific link. Usually this can be resolved by running this line again.
- the command: