Skip to content
This repository has been archived by the owner on Nov 9, 2017. It is now read-only.

Install guide

spladug edited this page Jun 28, 2011 · 46 revisions

These instructions will guide you through the process of setting up a reddit clone for the first time. We also have an automated install script for Ubuntu Linux.

Prerequisites

Before continuing with this guide, make sure you have all of reddit's many dependencies installed.

This guide will assume that you are installing reddit as user reddit in the directory /home/reddit. If this is not the case, modify the examples accordingly.

Get the code

Clone the git repository available here on github.

$ git clone git://github.com/reddit/reddit.git

Once this is done, you'll need to install the python module dependencies.

$ cd reddit/r2
$ sudo python setup.py develop
$ make

The setup.py script only needs to be run at installation time, but the Makefile will need to be rerun any time you modify a Cython file (*.pyx) so that the code can be compiled.

PostgreSQL

PostgreSQL is reddit's primary data store. It is used for storing data on Accounts, Subreddits, Links, Comments, Votes, etc. In production, we use many separate database clusters, but for sites with less traffic (or test instances) a single postgres database should suffice.

Initialize postgres

This section may be unnecessary on your system. Check if your installation of postgres created a default database and start scripts for you.

If one doesn't already exist, create an account for postgres to run under. This guide will assume the name postgres.

$ adduser postgres

Create a directory for postgres to store its data in.

$ sudo mkdir -p /usr/local/pgsql/data
$ sudo chown postgres /usr/local/pgsql/data

Then, initialize the database.

$ sudo -u postgres initdb -D /usr/local/pgsql/data

Finally, start up the server.

$ sudo -u postgres postgres -D /usr/local/pgsql/data

Create the database

Create a database for reddit's data.

$ createdb -E utf8 reddit

Then add reddit's SQL functions to the schema.

$ cd ~/reddit/
$ psql -U reddit reddit < sql/functions.sql

Cassandra

Cassandra is currently used primarily as a permanent cache, but the goal is for it to become our primary data store. As such, it is a vital component the reddit architecture.

To configure Cassandra for reddit, set up the necessary directories and replace the default storage.yaml with the one that comes with reddit.

# the /cassandra directory is configured in reddit's cassandra.yaml. you may change it if desired.
$ sudo mkdir /cassandra

# make sure the cassandra directory is accessible to the user cassandra will run as. 
$ sudo chown cassandra /cassandra

# the path to cassandra.yaml may vary depending on your system. change as necessary.
$ sudo mv /etc/cassandra/cassandra.yaml /etc/cassandra/cassandra.yaml.bak 
$ sudo ln -s ~/reddit/config/cassandra/cassandra.yaml /etc/cassandra/

Next, you need to create the keyspace for reddit and the permacache column family.

$ cassandra-cli 
[default@unknown] connect localhost/9160;
[default@unknown] create keyspace reddit with replication_factor = 1;
[default@unknown] use reddit;
[default@unknown] create column family permacache with column_type = 'Standard' and comparator = 'BytesType';

The reddit application will create the rest of the required column families automatically.

RabbitMQ

RabbitMQ is used for asynchronous job processing. Jobs are pushed onto a set of queues by user actions (such as creating a post or comment) for tasks that need not be done during the POST. As such, in addition to getting rabbit running, there are a set of services responsible for removing jobs from these queues covered under the services section.

Configuration of rabbit is relatively simple.

$ sudo rabbitmqctl add_vhost /
$ sudo rabbitmqctl add_user reddit reddit
$ sudo rabbitmqctl set_permissions -p / reddit ".*" ".*" ".*"

memcached

Almost everything in reddit depends on memcached running, and you won't be able to do much without it.

Most package managers will set up run scripts for memcached automatically. To check if it is already running, use telnet.

$ telnet localhost 11211

If that is unable to connect, you must start the memcache daemon.

$ memcached 

Test your installation

Before continuing, make sure that reddit is able to start up and connect to the databases.

$ cd ~/reddit/r2
$ paster serve --reload example.ini http_port=8080

You should be able to access reddit at http://127.0.0.1:8080/.

Troubleshooting

Following are some of the more commonly seen problems at this point in the installation.

Error Resolution
ImportError: No module named wrapped You need to compile the Cython modules.
$ cd ~/reddit/r2
$ make
</td>
ImportError: No module named rails.asset_tag setup.py installed the wrong versions of some dependencies. You must downgrade them.
$ sudo easy_install "Paste==1.7.2-reddit-0.2"
$ sudo easy_install "webhelpers==0.6.4"
(OperationalError) FATAL: password authentication failed for user "reddit" The postgres user "reddit" has the wrong password. You should recreate the postgres user "reddit" with password "password".
reddit$ su postgres
postgres$ dropuser reddit
postgres$ createuser -P reddit
</td>

Populate with test data

For testing purposes, you can generate random test data for your reddit install.

$ cd ~/reddit/r2
$ paster shell example.ini
>>> from r2.models import populatedb
>>> populatedb.populate()

Note: this will also create a reddit account named reddit with password password.

Services and Cron Jobs

Configuration

The various services and cron jobs in the repository expect the existence of run.ini in ~/reddit/r2.

In ~/reddit/r2, there is a script called updateini.py which reads ini files and applies differences from a specified .update file. This means that you can make changes to the provided example.ini by modifying a .update file without worrying about merge issues down the line.

To take advantage of this infrastructure do the following.

$ cd ~/reddit/r2
$ touch run.update
$ make 

Services

The reddit repository includes a srv/ directory where daemontools runscripts are kept. To set these up, you must point daemontools at the runscripts.

# the /service/ path may vary. on debian/ubuntu use /etc/service instead. 
$ sudo ln -s ~/reddit/srv/* /service/

This will run two instances of the reddit app as well as an haproxy instance to balance between them. Runscripts are included for memcached and cassandra as well. If you are already running these services through another method, you can delete the symlinks to avoid issues. There are also several runscripts for queue processors and the like. See Services for more detalis on what these do.

Crons

There are several jobs that need to be run periodically to update the site. Following is a recommended crontab. See Cron Jobs for information on what each of these does.

# m   h dom mon dow    command
*/5   *   *   *   *    ~/reddit/scripts/rising.sh
*/4   *   *   *   *    ~/reddit/scripts/send_mail.sh
*/3   *   *   *   *    ~/reddit/scripts/broken_things.sh
1     *   *   *   *    ~/reddit/scripts/update_promos.sh
0    23   *   *   *    ~/reddit/scripts/update_reddits.sh
30   23   *   *   *    ~/reddit/scripts/update_sr_names.sh

Troubleshooting

Check the FAQ for help with any issues you may be encountering at this point.