# Scraping your Instagram Account

**Instagraph** wants to analyze the **relations** between **you**, your **followers** and your **following**. Therefore, the **first step** is to **retrieve** these **information** from the **Instagram network**.  
[**InstaPy**](https://github.com/timgrossmann/InstaPy) is a **tool** which enables you to **interact** with your account **automatically**. For **more details** you can refer to its [**documentation**](https://github.com/timgrossmann/InstaPy/blob/master/DOCUMENTATION.md).  

## Following/Followers Relations

The script below selects **all** the **users** in your **followers or following** list and **for each** of them **stores** locally his **following list**.  
We are interested in the following list of a user because it is generally **smaller than** the **followers** one, especially for well known accounts (e.g. influencers), and considering only it does **not omit links** between **any pair of users** connected to your profile.  
Depending on the amount of your following/followers the script may take **several hours** in order to complete the task.

In [None]:
from instapy import InstaPy
from instapy import smart_run
from instapy import set_workspace


# set workspace folder at desired location (default is at your home folder)
set_workspace(path=None)

insta_username = "your_username"
insta_password = "your_password"

# get an InstaPy session!
session = InstaPy(username=insta_username, password=insta_password, headless_browser=True)

with smart_run(session):
    my_followers = session.grab_followers(username=insta_username, amount="full", live_match=True, store_locally=True)
    my_following = session.grab_following(username=insta_username, amount="full", live_match=True, store_locally=True)
    
    for follow in list(set(my_followers) | set(my_following)):
        session.grab_following(username=follow, amount="full", live_match=False, store_locally=True)


If you have not modified the workspace, in your home directory there will be a folder called `InstaPy`. Inside it, precisely at the location `logs/your_username/relationship_data/following` will be present all the files you are interested to. Each of them is in `json` format and contains only an array of strings.

## Merge Data

In order to **correctly import** the **scraped data** inside **Apache Spark** a further step is needed. It consists in **merging together** all the files inside a unique `data.json` which have **usernames** as **keys** and **arrays of following** as **values**.  
Supposing each `json` file name is the corresponding account username, this script can do the trick:

In [1]:
import json
import os

path = os.getenv("HOME") + "/InstaPy/logs/your_username/relationship_data/following/"
file_names = os.listdir(path)
data = {}

for name in file_names:
    input_file = open(path + name)
    json_array = json.load(input_file)
    data[name[:-5]] = json_array

with open('src/main/resources/your_username.json', 'w') as f:
    json.dump(data, f)