# Knogin Hunter's Ipython Advanced Mode
## Netflows
This guide provides examples about how to request data, show data with some cool libraries like pandas and more.


**Import Libraries**

The next cell will import the necessary libraries to execute the functions. Do not remove

In [61]:
import datetime
import pandas as pd
import numpy as np
import linecache, bisect
import os

spath = os.getcwd()
path = spath.split("/")
date = path[len(path)-1]

**Request Data**

In order to request data we are using Graphql (a query language for APIs, more info at: http://graphql.org/).

We provide the function to make a data request, all you need is a query and variables


In [62]:
def makeGraphqlRequest(query, variables):
    return GraphQLClient.request(query, variables)

Now that we have a function, we can run a query like this:

*Note: There's no need to manually set the date for the query, by default the code will read the date from the current path

In [63]:
suspicious_query = """query($date:SpotDateType) {
                            flow {
                              suspicious(date:$date)
                                  {
                                      srcIp
                                      dstIp
                                      srcPort
                                      dstPort
                                      score
                                      srcIp_domain
                                      dstIp_rep
                                      protocol
                                      outBytes
                                      inPkts
                                      srcIp_rep
                                      inBytes
                                      srcIp_isInternal  
                                      rank 
                                      dstIp_geoloc
                                      tstart
                                      outPkts  
                                      dstIp_isInternal
                                      dstIp_domain
                                  }
                            }
                    }"""

##If you want to use a different date for your query, switch the 
##commented/uncommented following lines

variables={
    'date': datetime.datetime.strptime(date, '%Y%m%d').strftime('%Y-%m-%d')
#     'date': "2016-10-08"
    }
 
suspicious_request = makeGraphqlRequest(suspicious_query,variables)

##The variable suspicious_request will contain the resulting data from the query.
results = suspicious_request['data']['flow']['suspicious']


##Pandas Dataframes

The following cell loads the results into a pandas dataframe

For more information on how to use pandas, you can learn more here: https://pandas.pydata.org/pandas-docs/stable/10min.html

In [64]:
df = pd.read_json(json.dumps(results))
##Printing only the selected column list from the dataframe
##By default it will only print the first 15 results
print df[['srcIp','dstIp','srcPort','dstPort','score']]

             srcIp            dstIp  srcPort  dstPort     score
0   65.108.207.181  157.245.204.195       22    62007  0.000096
1   65.108.207.181  157.245.204.195       22    62007  0.000096
2   65.108.207.181    190.7.216.137       22    49977  0.000096
3   65.108.207.181    190.7.216.137       22    49977  0.000096
4   65.108.207.181   162.62.191.231      443    45232  0.000096
5   65.108.207.181   162.62.191.231      443    45232  0.000096
6   65.108.207.181    64.62.197.185        0      768  0.000096
7   65.108.207.181    64.62.197.185        0      768  0.000096
8   65.108.207.181  157.245.204.195       22    51864  0.000096
9   65.108.207.181  157.245.204.195       22    51864  0.000096
10  65.108.207.181   104.152.52.189        0      771  0.000096
11  65.108.207.181   104.152.52.189        0      771  0.000096
12  65.108.207.181  157.245.204.195       22    61993  0.000096
13  65.108.207.181  157.245.204.195       22    61993  0.000096
14  65.108.207.181     104.16.20.35    3

##Additional operations 

Additional operations can be performed on the dataframe like sorting the data, filtering it and grouping it

**Filtering the data**

In [65]:
##Filter results where the destination port = 3389
##The resulting data will be stored in df2 

df2 = df[df['dstPort'].isin(['3389'])]
print df2[['srcIp','dstIp','srcPort','dstPort','score']]

Empty DataFrame
Columns: [srcIp, dstIp, srcPort, dstPort, score]
Index: []


**Ordering the data**

In [66]:
srtd = df.sort_values(by="rank")
print srtd[['rank','srcIp','dstIp','srcPort','dstPort','score']]

    rank           srcIp            dstIp  srcPort  dstPort     score
0      0  65.108.207.181  157.245.204.195       22    62007  0.000096
1      0  65.108.207.181  157.245.204.195       22    62007  0.000096
2      1  65.108.207.181    190.7.216.137       22    49977  0.000096
3      1  65.108.207.181    190.7.216.137       22    49977  0.000096
4      4  65.108.207.181   162.62.191.231      443    45232  0.000096
5      4  65.108.207.181   162.62.191.231      443    45232  0.000096
6      5  65.108.207.181    64.62.197.185        0      768  0.000096
7      5  65.108.207.181    64.62.197.185        0      768  0.000096
8      6  65.108.207.181  157.245.204.195       22    51864  0.000096
9      6  65.108.207.181  157.245.204.195       22    51864  0.000096
11     9  65.108.207.181   104.152.52.189        0      771  0.000096
10     9  65.108.207.181   104.152.52.189        0      771  0.000096
12    10  65.108.207.181  157.245.204.195       22    61993  0.000096
13    10  65.108.207

**Grouping the data**

In [67]:
## This command will group the results by pairs of source-destination IP
## summarizing all other columns 
grpd = df.groupby(['srcIp','dstIp']).sum()
## This will print the resulting dataframe displaying the input and output bytes columnns
print grpd[["inBytes","inPkts"]]

                                 inBytes  inPkts
srcIp          dstIp                            
65.108.207.181 104.152.52.189        224       4
               104.16.20.35        30922     514
               104.16.27.35        67892    1006
               118.123.105.87      22096      32
               140.82.121.4        20164      54
               141.30.62.23       110716    1644
               151.101.244.209   2585252   26820
               157.245.204.195  18612156   25316
               162.62.191.231        960      16
               190.7.216.137        1968      10
               34.107.166.226        480      12
               52.217.92.198       41286    1022
               64.62.197.185        1152       2
               69.67.150.36          350       2
               78.193.124.169      99248     210
77.83.242.56   65.108.207.136       1200      20
95.217.255.69  65.108.207.187       2340      18


**Reset Scored Connections**

Uncomment and execute the following cell to reset all scored connections for this day

In [68]:
# reset_scores = """mutation($date:SpotDateType!) {
#                   flow{
#                       resetScoredConnections(date:$date){
#                       success
#                       }
#                   }
#               }"""


# variables={
#     'date': datetime.datetime.strptime(date, '%Y%m%d').strftime('%Y-%m-%d')
#     }
 
# request = makeGraphqlRequest(reset_scores,variables)

# print request['data']['flow']['resetScoredConnections ']['success']


##Sandbox

At this point you can perform your own analysis using the previously provided functions as a guide.

Happy threat hunting!

In [69]:
#Your code herejhgjyuytuykhgjyr