**PySDS Week 4. Lecture 2. V.1**
Last author: B. Hogan

# Week 4. Day 2. Working on a server 

Large code examples will not run on a computer. The IMDB code is really at the limit of what you might expect to be practically doing on a laptop. For many tasks there is a reason to run them remotely. For example:
- You want to listen to a stream of data and you don't want to keep your laptop open and connected.
- You have too much data for your computer to load.
- You need processing power that's not available locally.

For small tasks, a boost in ram might make a big difference, but for tasks on gigabytes of data or persistent connections, it won't. What makes a difference is using a dediated machine with a known history of continuous uptime.

Linux is an operating system like Mac or Windows. It is most commonly seen in scientific work and in server administration. It does not always support the hardware of consumer devices (for example, there have yet to be reports of a Linux distribution that can drive the fingerprint reader on the Yoga 920, much to my disappointment). Linux, based on the Unix operating system, can be administered quite extensively from a command prompt. In fact, the prompt is in a shell that is its own language. Typically on a mac or linux you would be using bash or born-again shell [bash]. The good thing about this is that terminals are then easily to remote access. 

We can access another computer's shell remotely if we know the address of the server and it is configured for SSH. In which case we use the following syntax (on the Mac terminal, the Linux shell and Cygwin for windows). 
``` bash
ssh USERNAME@domain.com
```
or 
``` bash
ssh USERNAME:PORT@domain.com
```
In this case the domain is ```<redacted>.oii.ox.ac.uk```. 

* Important note: I have had trouble giving my password via Windows Powershell, so I recommend downloading and installing Cygwin with the optional OpenSSH modules when you get to the install screen. This will be shown in class.

When you first log in it will ask you to trust the key, select yes. Then either type your password or copy and paste it. This is fragile. Please be systematic and careful. It will lock you out after 5 attempts. I have been given instructions to reset the lock out, but I cannot guarantee I'll be able to use them properly. **Measure twice, cut once**.  

## Navigating the server

The server can be navigated via the same commands as mac, for it is linux. This includes 
- ```cd``` change directory, recall ```~``` is home, ```.``` is here and ```..``` is up one 
- ```ls``` list directory, argument -a means all, i.e. "ls -a"
- ```man``` the help page, so for help on other arguments for ls it would be ```man ls```
- ```touch``` creates a new file.

## Copying files to a server 

To copy files to a server you can use scp (or secure copy) through both cygwin and \*nix systems. To do this scp is run from the terminal (outside of ssh) with remote and local file paths as arguments. 

``` bash
# download: remote -> local
scp user@remote_host:remote_file local_file 
```

The local file can also be a directory where the file would go. To upload, the local file is placed first:

``` bash
# upload: local -> remote
scp local_file user@remote_host:remote_file
```

So if we have python file "twitterServer.py" on our computer at ~/Desktop/twitterServer.py then you would type:

```
cd ~Desktop/
scp twitterServer.py inetXXXX@<redacted>.oii.ox.ac.uk
```
And it should prompt you first for a password. If successful it will show a file copy dialog and then complete.

## Editing text on the server

There are a few ways to edit text on a server. There are two basic text editors. ```Nano``` and ```vi```. Many hardcore programmers love vi because it employs a huge variety of keyboard shortcuts. It's for the same reason that most people find vi to be a huge pain. There are even games out there to help you improve your vi skills. But I personally think it will be futile without serious commitment. Regardless, I actually don't mind tweaking things in ```vi``` when I'm working on a server.

vi started with the command ```vi```. You are then presented with a blank screen with 

```bash 
~ 
~ 
~ 
~``` 
going down the left hand side. This is the 'end of the document'. you cannot type right away in vi, but instead have to switch to one of its editing modes. Pressing ```i``` will do that, then you ccan type. Press escape and you are out of editing mode. Then in order to make a system command you have to press ```:```. To write, you would type ```w``` and press enter. To save and exit you would press ```wq```, to quit without saving it is ```q!```. I will demonstrate this, but then return to it here. It is confusing but it has a logic to it, just a foreign one to most students here. 

Follow along as we will first create a python file in ```vi```, then copy it to the server, log into the server, run it and then exit. 

The file is going to be called "example.py". It will be really simple: 

``` python
import time,datetime

while 1:
    print("The time is now: %s" % datetime.datetime.now())
    time.sleep(3)
    
```

## Running a program on a server

You will notice that when we run it on the server that it keeps going until we stop it, which we can do with a keyboard interrupt. But what happens if we want to leave the server, does it continue running? No. The shell that you create when logging into the terminal only lives while you are running it. It is destroyed when the connection is destroyed. 

In order to keep it running on the server, it has to be run from a shell that is not tied to ssh. To do this we use a **multiplexer**. That is a program that is going to create a second shell window for us that we can check in on and leave. If we have left it then we can get back to it. 

To do this we use ```screen```. This program is a multiplexer that will spawn a new instance of a terminal for you to use every time you type screen. It then displays that window. From this second window you can run commands, then exit the screen and the commands will still keep running. Let's first ```screen``` then run the python file. 

How do we escape this screen? It does not give a huge amount of feedback, but you would want to press $ctrl-a, d$. Control-a first lets screen know you are going to enter a command. Then $d$ is the command for **detaching**. This should bring you back to the main terminal window. To reattach you should type: ```screen -r```. If you happen to have more than one screen it will list these with random identifiers called ```pid```s. You can type ```screen -r <pid>``` to get the correct working pid. As a tip, you can name a screen when you first create it by typing ```screen -S <name>``` and then reattach with that.

# Section 2. Creating a Twitter Stream listener 

There are many reasons to create a Twitter stream listener if you want to collect your own live data from the site. First let's check that the module was instantiated correctly. 

In [None]:
try: 
    import tweepy
except ModuleNotFoundError:
    import sys
    !{sys.executable} -m pip install git+https://github.com/tweepy/tweepy.git
    import tweepy

If we don't get an error then we should be all good. Now let's go over to Twitter to get some API keys. We start at https://developer.twitter.com/ and then go to "apps" under your name. We want to create a new app, get the keys, get the secret keys and then make use of them. We can do this in a similar way to what we did with API keys from reddit. (i.e. create the json, close it, delete them from the script. Bear in mind you will have to upload both the json and the script to the server later. 

In [None]:
1/0
import json 

keys = {"CONSUMER_KEY":"", 
        "CONSUMER_SECRET":"", 
        "ACCESS_TOKEN":"", 
        "ACCESS_TOKEN_SECRET":"",
        "gmail":""}

with open("twitter_keys.json",'w') as infile:
    infile.write( json.dumps(keys) )

In [None]:
TWEETFILE = "Tweet_Output.dat"

keys = json.loads(open("twitter_keys.json").read())

auth = tweepy.OAuthHandler(keys['CONSUMER_KEY'],keys['CONSUMER_SECRET'])
auth.set_access_token(keys['ACCESS_TOKEN'], keys['ACCESS_TOKEN_SECRET'])

api = tweepy.API(auth)

if api:
    print("Successfully Authenticated")
else:
    print("Problems with authentication")

class CustomStreamListener(tweepy.StreamListener):

    def __init__ (self,limit=100,outfile="fileout.dat",counter=10):
        self.count = 0
        self.limit = limit
        self.counter = counter
        self.fileout = open(outfile,'a')
        
    def on_error(self, status_code):
        print ('Encountered error with status code:', status_code)
        
        return True # Don't kill the stream

    def on_timeout(self):
        print('Timeout...')
        time.sleep(1)
        return True # Don't kill the stream

    def on_data(self, data):
        self.count += 1
        if self.count % self.counter == 0:
            print("Processing Tweet: %s" % self.count)
        if self.count == self.limit:
            self.fileout.close()
            return False
        else:
            self.fileout.write(data.strip() + "\n")
        
# Notice that this instantiates the stream listener but it does not start it. 
streaming_api = tweepy.streaming.Stream(auth,CustomStreamListener(), timeout=60)

# This is the filter we use; filters on twitter can be very complex. 
TWEET_FILTER = ["Trump"]

# This starts the stream listener. 
streaming_api.filter(follow=None, track=TWEET_FILTER)

# Section 3. Email warnings

Building in an email warning is a useful way to alert you if something goes wrong on the server. We use gmail since Google enables us to have app passwords that are specific to the program and don't require two factor authentication. 

In [None]:
import time
import smtplib
import datetime

def send_email(test=True, text = "",pw=""):

    if pw == "":
        print("Did not include a password")
        return False
    else:
        gmail_pwd = pw # Use your own password! - see https://security.google.com/settings/security/apppasswords

    gmail_user = "bernie.hogan@gmail.com"
    FROM = "bernie.hogan@gmail.com"
    TO = [""]
    SUBJECT = "Help, the stream is broken!"
    TEXT = "The stream produced an error. Please return to the server and check it out. %s" % text
    # Prepare actual message
    message = """From: %s\nTo: %s\nSubject: %s\n\n%s
    """ % (FROM, ", ".join(TO), SUBJECT, TEXT)

    print(message)
    try:
        server = smtplib.SMTP("smtp.gmail.com", 587)
        server.ehlo()
        server.starttls()
        server.login(gmail_user, gmail_pwd)
        server.sendmail(FROM, TO, message)
        server.close()
        print('successfully sent the mail')
    except:
        print("failed to send mail")

send_email(text = "False alarm, just starting the program %s" % datetime.datetime.now() ,pw=keys["gmail"])

Now you can embed this method into your program, wrap the stream listener in a try / except statement and if it fails, on the exception it will email you to say that there was an issue. Like so: 

In [None]:
import json 
keys = json.loads(open("twitter_keys.json").read())


In [None]:
try:
    # This starts the stream listener. 
    streaming_api.filter(follow=None, track=TWEET_FILTER)
    1/0
except Exception as e:
    send_email(text = "We received the following error that stopped the program: %s" % e)

# Section 4. Checking the data. 

First we will want to get the data out of the server using ```scp```, then we will want to parse it. I placed data in a flat file with one tweet object per line. Now these days Twitter has an 'extended_tweet' objecct for long tweets. 

See the code snippet below

In [None]:
TWEETFILE = "Tweet_Output.dat"

with open(TWEETFILE) as filein:
    for i in filein.readlines(): 
        if len(i) > 1:
            x = json.loads(i.strip())
            if x["truncated"]:
                print(x["extended_tweet"]["full_text"],"\n")
            else:
                print(x["text"],"\n")