Skip to content

Commit

Permalink
Updated to scrape For Sale category
Browse files Browse the repository at this point in the history
  • Loading branch information
meub committed Jul 15, 2017
1 parent bf0dc59 commit 54f6179
Show file tree
Hide file tree
Showing 15 changed files with 228 additions and 404 deletions.
4 changes: 0 additions & 4 deletions .dockerignore

This file was deleted.

39 changes: 0 additions & 39 deletions Dockerfile

This file was deleted.

2 changes: 1 addition & 1 deletion LICENSE
@@ -1,5 +1,5 @@
The MIT License (MIT)
Copyright (c) 2016 Vik Paruchuri
Copyright (c) 2017 Alex Meub

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

Expand Down
107 changes: 25 additions & 82 deletions README.md
@@ -1,101 +1,44 @@
Apartment finder
Craigslist For Sale Alerts
-------------------

This repo contains the code for a bot that will scrape Craigslist for real-time listings matching specific criteria, then alert you in Slack. This will let you quickly see the best new listings, and contact the owners. You can adjust the settings to change your price range, what neighborhoods you want to look in, and what transit stations and other points of interest you'd like to be close to.
Craigslist For Sale Alerts is a bot that will scrape Craigslist for real-time for sale postings matching specific criteria. When it finds a listing that it hasn't already seen, it will alert you via Slack and/or Email.

I successfully used this tool to find an apartment when I moved from Boston to SF. It saved a good amount of time and money. Read more about it [here](https://www.dataquest.io/blog/apartment-finding-slackbot/).
It can optionally read from an external JSON list of search URLs so that search criteria can be updated independently.

It's recommended to follow the Docker installation and usage instructions.
This project is adapted from [apartment finder](https://github.com/VikParuchuri/apartment-finder) by Vik Paruchuri.

Settings
--------------------

Look in `settings.py` for a full list of all the configuration options. Here's a high level overview:

* `MIN_PRICE` -- the minimum listing price you want to search for.
* `MAX_PRICE` -- the minimum listing price you want to search for.
* `CRAIGSLIST_SITE` -- the regional Craigslist site you want to search in.
* `AREAS` -- a list of areas of the regional Craiglist site that you want to search in.
* `BOXES` -- coordinate boxes of the neighborhoods you want to look in.
* `NEIGHBORHOODS` -- if the listing doesn't have coordinates, a list of neighborhoods to match on.
* `MAX_TRANSIT_DISTANCE` -- the farthest you want to be from a transit station.
* `TRANSIT_STATIONS` -- the coordinates of transit stations.
* `CRAIGSLIST_HOUSING_SECTION` -- the subsection of Craigslist housing that you want to look in.
* `SLACK_CHANNEL` -- the Slack channel you want the bot to post in.

External Setup
Deployment
--------------------

Before using this bot, you'll need a Slack team, a channel for the bot to post into, and a Slack API key:

* Create a Slack team, which you can do [here](https://slack.com/create#email).
* Create a channel for the listings to be posted into. [Here's](https://get.slack.help/hc/en-us/articles/201402297-Creating-a-channel) help on this. It's suggested to use `#housing` as the name of the channel.
* Get a Slack API token, which you can do [here](https://api.slack.com/docs/oauth-test-tokens). [Here's](https://get.slack.help/hc/en-us/articles/215770388-Creating-and-regenerating-API-tokens) more information on the process.
# copy all files
scp *.py user@domain:/path/to/project/directory
scp listings.db user@domain:/path/to/project/directory

Configuration
--------------------

## Docker

* Create a folder called `config`, then put a file called `private.py` inside.
* Specify new values for any of the settings above in `private.py`.
* For example, you could put `AREAS = ['sfc']` in `private.py` to only look in San Francisco.
* If you want to post into a Slack channel not called `housing`, add an entry for `SLACK_CHANNEL`.
* If you don't want to look in the Bay Area, you'll need to update the following settings at the minimum:
* `CRAIGSLIST_SITE`
* `AREAS`
* `BOXES`
* `NEIGHBORHOODS`
* `TRANSIT_STATIONS`
* `CRAIGSLIST_HOUSING_SECTION`
* `MIN_PRICE`
* `MAX_PRICE`

## Manual

* Create a file called `private.py` in this folder.
* Add a value called `SLACK_TOKEN` that contains your Slack API token.
* Add any other values you want to `private.py`.

Installation + Usage
--------------------
# install modules
pip install -r requirements.txt

## Docker
# set permissions
chmod +x main_loop.py
chmod +x scraper.py
chmod +x settings.py
chmod +x util.py
chmod +x listings.db

* Make sure to do the steps in the configuration section above first.
* Install Docker by following [these instructions](https://docs.docker.com/engine/installation/).
* To run the program with the default configuration:
* `docker run -d -e SLACK_TOKEN={YOUR_SLACK_TOKEN} dataquestio/apartment-finder`
* To run the program with your own configuration:
* `docker run -d -e SLACK_TOKEN={YOUR_SLACK_TOKEN} -v {ABSOLUTE_PATH_TO_YOUR_CONFIG_FOLDER}:/opt/wwc/apartment-finder/config dataquestio/apartment-finder`

## Manual
# start as service
nohup python main_loop.py &

* Look in the `Dockerfile`, and make sure you install any of the apt packages listed there.
* Install Python 3 using Anaconda or another method.
* Install the Python requirements with `pip install -r requirements.txt`.
* Run the program with `python main_loop.py`. Results will be posted to your #Housing channel if successful.
# find it again
ps ax | grep main_loop.py

Troubleshooting
---------------------
# kill it
kill -9 process-id

## Docker

* Use `docker ps` to get the id of the container running the bot.
* Run `docker exec -it {YOUR_CONTAINER_ID} /bin/bash` to get a command shell inside the container.
* Run `sqlite listings.db` to run the sqlite command line tool and inspect the database state (the only table is also called `listings`).
* `select * from listings` will get all of the stored listings.
* If nothing is in the database, you may need to wait for a bit, or verify that your settings aren't too restrictive and aren't finding any listings.
* You can see how many listings are being found by looking at the logs.
* Inspect the logs using `tail -f -n 1000 /opt/wwc/logs/afinder.log`.

## Manual
Testing
--------------------

* Look at the stdout of the main program.
* Inspect `listings.db` to ensure listings are being added.
The `remove_listing.py` file is useful for testing your alerts and allows you to easily delete a record from your local database to make sure your alert fires as you expect. Also, it displays the most recent 15 items in your local database for debugging purposes.

Deploying
---------------------

* Create a server that has Docker installed. It's suggested to use Digital Ocean.
* Follow the configuration + installation instructions for Docker above.
3 changes: 3 additions & 0 deletions craigslist_urls.json
@@ -0,0 +1,3 @@
[
"https://portland.craigslist.org/search/sss?query=super+nintendo&sort=rel&hasPic=1&postedToday=1&search_distance=15&postal=97205"
]
6 changes: 0 additions & 6 deletions deployment/deploy.sh

This file was deleted.

9 changes: 0 additions & 9 deletions deployment/scraper.conf

This file was deleted.

13 changes: 0 additions & 13 deletions deployment/setup_scraper.sh

This file was deleted.

8 changes: 0 additions & 8 deletions deployment/supervisord.conf

This file was deleted.

Empty file modified main_loop.py 100644 → 100755
Empty file.
61 changes: 61 additions & 0 deletions remove_listing.py
@@ -0,0 +1,61 @@
from sqlalchemy import create_engine
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy import Column, Integer, String, DateTime, Float, Boolean
from sqlalchemy.orm import sessionmaker
from dateutil.parser import parse
from util import post_listing_to_slack, send_listing_email
from slackclient import SlackClient
from urlparse import urlparse
from bs4 import BeautifulSoup
from email.mime.text import MIMEText

import time
import settings
import requests
import sys
import json
import smtplib

## Put Craigslist ID of the Craigslist item to delete here
cl_id_to_delete = 1234567890

engine = create_engine('sqlite:///listings.db', echo=False)
Base = declarative_base()

class Listing(Base):
"""
A table to store data on craigslist listings.
"""

__tablename__ = 'listings'

id = Column(Integer, primary_key=True)
link = Column(String, unique=True)
created = Column(DateTime)
name = Column(String)
price = Column(Float)
cl_id = Column(Integer, unique=True)

Base.metadata.create_all(engine)
Session = sessionmaker(bind=engine)
session = Session()

rows = session.query(Listing).count()
print "%s rows in database" % (rows)

listings = session.query(Listing).order_by(Listing.id.desc())

i = 0
for listing in listings:
if( i > 15):
break
print "%s: %s - %s" % (listing.id,listing.name, listing.cl_id)
i+=1

session.query(Listing).filter_by(cl_id=cl_id_to_delete).delete()
session.commit()
print
print "Deleted: %s " % (cl_id_to_delete)

rows = session.query(Listing).count()
print "%s rows in database" % (rows)
2 changes: 1 addition & 1 deletion requirements.txt
@@ -1,5 +1,5 @@
python-craigslist>=1.0.3
sqlalchemy
python-dateutil
ipython
slackclient
beautiful-soup

0 comments on commit 54f6179

Please sign in to comment.