-
Make sure JAVA_HOME path is added in the "~/.bashrc" file
Typesudo vim ~/.bashrc
and add the path in the end. The path is usually in the/usr/lib/jvm/
folder.Check the Path by typing
echo $JAVA_HOME
-
Make sure curl is installed, else install it using apt-get
-
To install Cassandra on your node, type the following commands, or Follow these instructions: http://cassandra.apache.org/download/:
echo "deb https://downloads.apache.org/cassandra/debian 311x main" | sudo tee -a /etc/apt/sources.list.d/cassandra.sources.list
Add the Apache Cassandra repository keys:
curl https://downloads.apache.org/cassandra/KEYS | sudo apt-key add -
Update the repositories:
sudo apt-get update
Install Cassandra:
sudo apt-get install cassandra
-
Start Cassandra with
sudo service cassandra start
and check the status usingnodetool status
andsudo service cassandra status
commands:
-
Stop Cassandra on all three nodes:
sudo service cassandra stop
-
Delete the default dataset:
sudo rm -rf /var/lib/cassandra/data/system/*
-
Edit the configuration file cassandra.yaml on all 3 nodes.It is located in the /etc/cassandra directory:
a. The directive
cluster_name
should be same across all the nodes:
cluster_name: 'Test Cluster'
b. Update
-seeds:
by adding all 3 IP addresses, separated by commas:
seeds: "206.81.9.191,67.205.176.120,206.81.9.184"
c. Update
listen_address
andrpc_address
on each node to the IP address to that node.
For "206.81.9.191", it should look like:
listen_address: 206.81.9.191
rpc_address: 206.81.9.191
d. Update
endpoint_switch
fromSimpleSnitch
toGossipingPropertyFileSnitch
:
endpoint_snitch: GossipingPropertyFileSnitch
e. Finally, add the following directive at the end of the file:
auto_bootstrap: false
Save the cassandra.yaml file all on 3 nodes
-
Start Cassandra on all 3 nodes again:
sudo service cassandra start
-
Check the status of the clusters with the
sudo nodetool status
command. All 3 nodes should be listed:
-
Make sure Python is installed on the VM, and then run CQL with the
cqlsh <nodeIPAdress>
command. -
In the CQL Terminal, create a keyspace:
CREATE KEYSPACE cloud_log WITH replication = {'class': 'SimpleStrategy','replication_factor':'1'} AND durable_writes = 'true';
Use that keyspace with the
use cloud_log ;
command. -
Create a table in the cloud_log keyspace with all the Common Log Format columns :
CREATE TABLE access_log (IP_Address text, client_id text, client_username text, time text,timezone text, request_line text, status_code text, object_size text, PRIMARY KEY(IP_Address,time,request_line));
Here, we are using the combination of IP_Address, timestamp and request line as the primary key.
-
Import the access_log data into your table using the copy command:
COPY access_log (IP_Address , client_id , client_username , time , timezone , request_line , status_code, object_size) FROM '/home/student/access_log' WITH DELIMITER=' ';
Check the data with a simple select query:
select * from access_log where IP_Address = '10.207.188.188' limit 5;
-
In order to run CQL Queries on
path
, we need to make another table withpath
as the main primary key.
CREATE TABLE Request_line_details (IP_Address text, time text,method text, path text, protocol text, PRIMARY KEY(path,IP_Address,time,method,protocol));
-
To load data into this table, we first need to split the
request_line
column. In order to do so, we first export the data into a new csv file:
COPY cloud_log.access_log (ip_address,time,request_line) TO 'requestline.csv' with DELIMITER = ' ' and QUOTE = ' ';
-
Next, we copy data from requestline.csv to the
request_line_details
table:
COPY cloud_log.request_line_details (IP_ADDRESS,time,method, path, protocol) FROM 'requestline.csv' WITH DELIMITER = ' ' and QUOTE = '"' and ESCAPE = '*';
Check the data with a simple select query:
select * from request_line_details limit 5;
-
How many hits were made to the website item “/assets/img/release-schedule-logo.png”?
Ans: 24282select count(*) from request_line_details where path = '/assets/img/release-schedule-logo.png\';
-
How many hits were made from the IP: 10.207.188.188?
Ans: 398select count(*) from cloud_log.access_log where IP_Address = '10.207.188.188' ;
First run pip install cassandra-driver
-
Which path in the website has been hit most? How many hits were made to the path?
Ans: Path: /assets/css/combined.css has been hit 115101 timesRun the script:
python Cassandra_most_path.py
-
Which IP accesses the website most? How many accesses were made by it?
Ans: IP: 10.216.113.172 has been accessed 158298 timesRun the script:
python Cassandra_most_ip.py