Clone this wiki locally
These instructions will guide you through the process of setting up a reddit clone for the first time. We also have an automated install script for Ubuntu Linux 14.04.
We also currently maintain a Vagrantfile that can help reduce the number of setup steps.
Before continuing with this guide, make sure you have all of reddit's many dependencies installed.
This guide will assume that you are installing reddit as user
/home/reddit. If this is not the case, modify the
Get the code
Clone the git repository available here on github.
$ git clone https://github.com/reddit/reddit.git
Once this is done, you'll need to install the python module dependencies.
$ cd reddit/r2 $ python setup.py build $ sudo python setup.py develop $ make
setup.py develop command only needs to be run at installation time,
but the Makefile will need to be rerun any time you modify a Cython file
*.pyx) so that the code can be compiled.
PostgreSQL is reddit's primary data store. It is used for storing data on Accounts, Subreddits, Links, Comments, Votes, etc. In production, we use many separate database clusters, but for sites with less traffic (or test instances) a single postgres database should suffice.
This guide assumes you have a working PostgreSQL install running under the unix user
postgres, and that you can access it with
sudo -u postgres psql. That's the norm if you installed PostgreSQL on Ubuntu/Debian/Fedora from operating system packages, and on Mac OS X using Homebrew. To test that, copy and paste the following command to your terminal and run it:
sudo -u postgres psql -qAt -c "SELECT 'connected ok, superuser: ' || (select usesuper from pg_user where usename = CURRENT_USER)||', version: '||version()"
It should print something like:
connected ok, superuser: true, version: PostgreSQL 9.3.1
You want "superuser: true", since you'll be doing some database setup tasks.
Make sure the PostgreSQL version printed is a reasonable, supported version; see PostgreSQL's version policy. Mac OS X users need to be particularly wary that the version printed is the same as the version they think they installed, as Apple installs an old version of PostgreSQL as part of Mac OS X.
If you installed PostgreSQL from source code instead of operating system packages or an installer you might need to create a database cluster first, as well as install the appropriate startup scripts. It's generally much better to install PostgreSQL from operating system packages. If your OS has only very old versions of PostgreSQL, see the PostgreSQL download page for alternatives, including deb (apt) and rpm (yum) packages.
Create the database
Create a database for reddit's data.
$ sudo -u postgres createdb -E utf8 reddit
And a user for the code to connect with.
$ sudo -u postgres psql reddit > CREATE USER reddit WITH PASSWORD 'password'; > \q
Then add reddit's SQL functions to the schema.
$ cd ~/reddit/ $ sudo -u postgres psql reddit < sql/functions.sql
Cassandra is a vital component of the reddit architecture that stores many pieces of data used throughout the site.
You must create the keyspace for reddit and the
permacache column family.
$ cassandra-cli -h localhost [default@unknown] create keyspace reddit; [default@unknown] use reddit; [default@unknown] create column family permacache with column_type = 'Standard' and comparator = 'BytesType';
The reddit application will create the rest of the required column families automatically.
RabbitMQ is used for asynchronous job processing. Jobs are pushed onto a set of queues by user actions (such as creating a post or comment) for tasks that need not be done during the POST. As such, in addition to getting rabbit running, there are a set of services responsible for removing jobs from these queues covered under the services section.
Configuration of rabbit is relatively simple.
$ sudo rabbitmqctl add_vhost / $ sudo rabbitmqctl add_user reddit reddit $ sudo rabbitmqctl set_permissions -p / reddit ".*" ".*" ".*"
The rabbitmq web management interface is accessible on port 15672
with the credentials
guest:guest. If you are using the
Vagrantfile for setup, you can also access the interface on your
host machine on the same port.
Almost everything in reddit depends on memcached running, and you won't be able to do much without it.
Most package managers will set up run scripts for memcached automatically. To check if it is already running, use telnet.
$ telnet localhost 11211
If that is unable to connect, you must start the memcache daemon.
Test your installation
Before continuing, make sure that reddit is able to start up and connect to the databases.
$ cd ~/reddit/r2 $ paster serve --reload example.ini http_port=8081
You should be able to access reddit at http://127.0.0.1:8081/.
Following are some of the more commonly seen problems at this point in the installation.
|ImportError: No module named wrapped||
You need to compile the Cython modules.
cd ~/reddit/r2 && make
|(OperationalError) FATAL: password authentication failed for user "reddit"||
The postgres user "reddit" has the wrong password. You should recreate the postgres user "reddit" with password "password".
reddit$ su postgres postgres$ dropuser reddit postgres$ createuser -P reddit
|File "/usr/lib/python2.7/dist-packages/gunicorn/config.py", line 356, in validate_group raise ConfigError("No such group: '%s'" % val) gunicorn.errors.ConfigError: No such group: 'NAME'||
gunicorn tries to use a group with name equivalent to your USER_NAME. Simple way to fix is to create a group named and add your user to it.
groupadd NAME && useradd -a -G NAME NAME
|TypeError: cannot concatenate 'str' and 'tuple' objects||
You may have a bad version of Python requests.
Populate with test data
For testing purposes, you can generate random test data for your reddit install.
$ cd ~/reddit $ reddit-run scripts/inject_test_data.py -c 'inject_test_data()'
Note: this will also create a reddit account named
Services and Cron Jobs
~/reddit/r2, there is a script called
updateini.py which reads ini
files and applies differences from a specified
.update file. This means
that you can make changes to the provided
example.ini by modifying a
.update file without worrying about merge issues down the line.
To take advantage of this infrastructure do the following.
$ cd ~/reddit/r2 $ touch run.update $ make
reddit's various services are managed with Upstart. You need a little bit of configuration to get started, change the settings according to your configuration:
# cat > /etc/default/reddit <<REDDIT export REDDIT_ROOT=$REDDIT_HOME/reddit export REDDIT_INI=$REDDIT_HOME/reddit/r2/run.ini export REDDIT_USER=$REDDIT_USER export REDDIT_CONSUMER_CONFIG=$REDDIT_HOME/consumer-counts.d alias wrap-job=$REDDIT_HOME/reddit/scripts/wrap-job alias manage-consumers=$REDDIT_HOME/reddit/scripts/manage-consumers REDDIT
consumer-counts.d directory mentioned above. For each asynchronous
job queue, create a file with the queue job's name in it in the
consumer-counts.d directory and make the contents of the file the number of
processors to run. For example:
$ cd $REDDIT_HOME/consumer-counts.d $ echo 1 > scraper_q $ echo 1 > commentstree_q $ echo 1 > newcomments_q $ echo 1 > vote_comment_q $ echo 1 > vote_link_q
Then, copy the job configuration files from the upstart/ directory to
$ sudo cp ~/reddit/upstart/* /etc/init/
You can then start up all the processors with:
$ sudo initctl emit reddit-start
There are several jobs that need to be run periodically to update the site. These jobs are also managed with upstart. Following is a recommended cron setup. See Cron Jobs for information on what each of these does.
0 3 * * * root /sbin/start --quiet reddit-job-update_sr_names 30 16 * * * root /sbin/start --quiet reddit-job-update_reddits 0 * * * * root /sbin/start --quiet reddit-job-update_promos */5 * * * * root /sbin/start --quiet reddit-job-clean_up_hardcache * * * * * root /sbin/start --quiet reddit-job-email */2 * * * * root /sbin/start --quiet reddit-job-broken_things */2 * * * * root /sbin/start --quiet reddit-job-rising
Check the FAQ for help with any issues you may be encountering at this point.