# Simple test notebook to scrape some submissions and put them into a Neo4j database

* Before running, follow the set up steps from the README and make sure you have edited the file `config.txt` and moved it to `/etc/` (or your chosen location and change the code in `scraper.py` and `data_loader.py`).

In [1]:
%load_ext autoreload

In [2]:
%autoreload 2

In [1]:
from data_loader import Data_Loader
from data_viewer import Data_Viewer
from annotator import Annotator
import time

# Scraping data

In [25]:
# A list of urls that of submissions that you want to add to your graph. 
# These should be top level posts (not links to comments)
submissions = ['https://www.reddit.com/r/sanfrancisco/comments/bs5f69/just_had_the_elementary_school_lottery_explained/']
# Below is the full list of submissions I'm currently using for the school choice project
# submissions = [
#     'https://www.reddit.com/r/sanfrancisco/comments/bs5f69/just_had_the_elementary_school_lottery_explained/',
#     'https://www.reddit.com/r/sanfrancisco/comments/7r3cy3/how_the_san_francisco_school_lottery_works_and/',
#     'https://www.reddit.com/r/sanfrancisco/comments/4ah4no/fuck_the_sf_school_lottery_thats_all/',
#     'https://www.reddit.com/r/sanfrancisco/comments/b5kbse/how_the_student_assignment_system_works_sfusd/',
#     'https://www.reddit.com/r/sanfrancisco/comments/9hh9z8/two_sf_school_board_members_to_introduce/',
#     'https://www.reddit.com/r/sanfrancisco/comments/4646v8/experience_with_enrolling_in_sfusd_school/',
#     'https://www.reddit.com/r/sanfrancisco/comments/a5nrej/sf_school_board_plans_to_replace_muchcriticized/',
#     'https://www.reddit.com/r/sanfrancisco/comments/bhcxhb/san_francisco_had_an_ambitious_plan_to_tackle/',
#     'https://www.reddit.com/r/sanfrancisco/comments/5e5834/i_made_a_website_of_sf_elementary_school_test/',
#     'https://www.reddit.com/r/sanfrancisco/comments/cg5coh/sfusd_kindergarten/'
# ]

We now create `dl`, the `Data_Loader` object. The constructor creates a connection to the database and also creates a `Scraper` object which connects to the Reddit API. The Neo4j database should be running and the credentials file needs to be set up correctly for this to run.

In [27]:
dl = Data_Loader()

In [6]:
dl.clear_graph()

In [None]:
dl.add_submissions(submissions)

# Querying and coding

To query and view our data, we use a `Data_Viewer`

In [30]:
dv = Data_Viewer()

For example, we can query a submission based on an id:

In [31]:
print(dv.view_submission("bs5f69"))

[Submission bs5f69]
 justasapling: 
 Just had the elementary school lottery explained to me. 
 And I'm hoping that they were wrong.

The way they explained it, it sounds like being *in the neighborhood* does nothing to get you into *the neighborhood school*.

What, then, is the point of neighborhood schools?


Then, to add codes to our data, we use the `Annotator.annotate()` method. Codes should be formatted: "code1: subcode1: subcodesubcode1; code2: subcode2; code3" and so on. The annotator needs to know the id and type of the content node you're annotating as well as the specific substring you'd like to code.

In [32]:
a = Annotator()

In [33]:
a.annotate(code = "stress; priorities: travel logistics", 
           content_id = "bs5f69", 
           content_type = "Submission", 
           content = "The way they explained it")