
# Detailed explanations and thought process on how this proxy works is down [here](#how-a-https-connection-works-with-the-proxy-in-steps)
</br>

## Useful links:


<ul>
    <li><h2>HTTP Tunnel and CONNECT method explanation</h2></li>
    <h3><a>https://en.wikipedia.org/wiki/HTTP_tunnel#HTTP_CONNECT_tunneling</a></h3>
    <h3><a>https://reqbin.com/Article/HttpConnect</a></h3>
    <li><h2>Explanation of how a proxy works:</h2></li>
    <h3><a>https://docs.mitmproxy.org/stable/concepts-howmitmproxyworks</a></h3>
    <li><h2>TLS Handshake explanation:</h2></li>
    <h3><a>https://www.cloudflare.com/learning/ssl/what-happens-in-a-tls-handshake</a></h3>
    <h2>More in-depth explanation of the TLS handshake:</h2>
    <h3><a>https://www.cisco.com/c/en/us/support/docs/security-vpn/secure-socket-layer-ssl/116181-technote-product-00.html</a></h3>
    <h2>Also, useful info on the TLS handshake and the 'client hello message':</h2>
    <h3><a>https://stackoverflow.com/questions/3897883/how-to-detect-an-incoming-ssl-https-handshake-ssl-wire-format</a></h3>
    <h3><a>https://security.stackexchange.com/questions/34780/checking-client-hello-for-https-classification</a></h3>
    <li><h2>Some additional information on HTTPS proxies:</h2></li>
    <h3><a>https://superuser.com/questions/1098988/confusion-about-https-proxies</a></h3>
    <li><h2>Some discourse about python sockets speed</h2></li>
    <h3><a>https://trio.discourse.group/t/why-are-python-sockets-so-slow-and-what-can-be-done/121</a></h3>
        <h3><a>https://stackoverflow.com/questions/12469121/python-tcp-socket-proxy-extremely-slow</a></h3>
    
    
<ul/>

In [1]:
from _thread import start_new_thread
import socket
import traceback
import sys
import ssl
import re

# Constants
LOCALHOST = ''
SERVER_PORT = 8080  # default port value
BUFFER_SIZE = 4096
MAX_CONNS = 100
TIMEOUT = 10
global_list = []

In [23]:
# Connecting the proxy's 'server' socket with the client

with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as server:
    server.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
    server.bind((LOCALHOST, SERVER_PORT))
    print(f"Server started on {'LOCALHOST'}:{SERVER_PORT}.")
    server.listen(MAX_CONNS)
    print("Listening for connections...")
    # conn recieves the data(request headers) from the browser, it also sends the data(ex. remote host's response) to the browser
    conn, addr = server.accept()
    with conn:
        print("Connected by:", addr)
        while True:
            data = conn.recv(BUFFER_SIZE)
            if not data:
                break
            
            print('\n', data, '\n')
            if data[:7] == b'CONNECT':
                start_new_thread(httpsconn_thread_wrapper, (conn, data))
    

Server started on LOCALHOST:8080.
Listening for connections...
Connected by: ('192.168.0.20', 51868)

 b'CONNECT docs.python.org:443 HTTP/1.1\r\nUser-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:92.0) Gecko/20100101 Firefox/92.0\r\nProxy-Connection: keep-alive\r\nConnection: keep-alive\r\nHost: docs.python.org:443\r\n\r\n' 

Connected to docs.python.org
Browser conn: 
<socket.socket fd=61, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0, laddr=('192.168.0.20', 8080), raddr=('192.168.0.20', 51868)>

 b'\x16\x03\x01\x02\x00\x01\x00\x01\xfc\x03\x03~\xb4\x94\xb3b\'2\x07\xee\xdd\xfd\x1dR=\xdf\t\x0c\x8a\nFk6\x07\xf3x\xfad{X\xd5\xbc\xe5 \xd8\x8c \xd35\\\x8e\x95\x83`]z\xcc\t\xe8\xf1\xfd\x87\x91\xd7Bx\x7fr\xea\xe4H\xc3\x1b\xf0\xac\x91\x00$\x13\x01\x13\x03\x13\x02\xc0+\xc0/\xcc\xa9\xcc\xa8\xc0,\xc00\xc0\n\xc0\t\xc0\x13\xc0\x14\x00\x9c\x00\x9d\x00/\x005\x00\n\x01\x00\x01\x8f\x00\x00\x00\x14\x00\x12\x00\x00\x0fdocs.python.org\x00\x17\x00\x00\xff\x01\x00\x01\x00\x00\n\x00\x0

Exception ignored in thread started by: <function httpsconn_thread_wrapper at 0x7f7b643e4ee0>
Traceback (most recent call last):
  File "<ipython-input-8-73f284b96d14>", line 18, in httpsconn_thread_wrapper
  File "<ipython-input-22-1d524bc158a4>", line 51, in resolve
OSError: [Errno 9] Bad file descriptor



Remote host response: 
b''


In [22]:
class HTTPSConnection:
    def __init__(self, connection_data, conn, ssl_sock):
        self.connection_data = connection_data
        self.browser_conn = conn
        self.ssl_sock = ssl_sock
    
    def connect_to_remote(self):
        pass
    
    @staticmethod
    def recvall(sock):
        """
        Receive all data from the socket. This method continues to recieve until all data is received.
        """
        buff_size = 4096
        data = b''
        while True:
            chunk = sock.recv(buff_size)
            data += chunk
            if not chunk or len(chunk) < buff_size:
                break
        
        return data
    
    def resolve(self):
        """
        Resolves the HTTPS connection
        """
        # forward the browser's request to the remote host and get the host's response
#         request_data = HTTPSConnection.recvall(self.browser_conn)
        print('Browser conn: ')
        print(self.browser_conn)
        while True:
            reply = self.browser_conn.recv(4096)
            print(reply)
            if len(reply) > 0:
                self.ssl_sock.send(reply)
            else:
                break
        print('SSL SOCK: ')
        print('\n', self.ssl_sock,'\n')
#         print('Request_data: ')
#         print(request_data)
#         self.ssl_sock.send(request_data)
        
        print('Browser conn: ')
        print('\n', self.browser_conn)
        response_data = HTTPSConnection.recvall(self.ssl_sock)
        print("\nRemote host response: ")
        print(response_data)
        self.browser_conn.send(response_data)
        

In [8]:
def httpsconn_thread_wrapper(conn, data):
    conn_data = ConnectionData(data)   # parse the CONNECT header
    
    # ssl_sock sends the data to the remote host, also, it recieves the response from the remote host
#     context = ssl.create_default_context()
#     with socket.create_connection((conn_data.hostname, conn_data.port)) as s:
#         with context.wrap_socket(s, server_hostname=conn_data.hostname) as ssl_sock:
#             print(f"Connected to {conn_data.hostname}")
#             conn.send('200 OK'.encode())  # if the connection was successfull, inform the browser
#             https_connection = HTTPSConnection(conn_data, conn, ssl_sock)
#             https_connection.resolve()

    with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as sock:
        sock.connect((conn_data.hostname, conn_data.port))
        print(f"Connected to {conn_data.hostname}")
        conn.send(b'HTTP/1.1 200 Connection established\r\n\r\n')  # if the connection was successfull, inform the browser
        https_connection = HTTPSConnection(conn_data, conn, sock)
        https_connection.resolve()


In [4]:
def connect_to_remote(conn, conn_data):
    """
    connects to remote web server
    """
    conn_data.print_data()
    
    
    # ssl_sock sends the data to the remote host, also, it recieves the response from the remote host
    with socket.create_connection((conn_data.hostname, conn_data.port)) as s:
        with context.wrap_socket(s, server_hostname=conn_data.hostname) as ssl_sock:
            print(f"Connected to {conn_data.hostname}")
            conn.send(b'HTTP/1.1 200 Connection established\r\n\r\n')  # if the connection was successfull, inform the browser
            

In [6]:
class ConnectionData:
    hostname: str
    port: int
    user_agent: str
    protocol: str
    
    def __init__(self, headers: bytes):
        self.headers = headers
        self.parse_headers()  # parse the headers on initialization
        
    def parse_headers(self):
        headers = self.headers.decode()
        split = headers.split('\r\n')
        
        host_re = r'Host:\s([a-z0-9\.\-]+):(\d+)'
        self.hostname = re.search(host_re, headers).group(1)
        self.port = int(re.search(host_re, headers).group(2))
        self.user_agent = split[1].split('User-Agent: ')[-1]
        self.protocol = split[0].split(' ')[-1]
        
    def print_data(self):
        print(f'Hostname:', self.hostname)
        print(f'Port:', self.port)
        print(f'User-Agent:', self.user_agent)
        print(f'Protocol:', self.protocol)

In [47]:
print(global_list[0])

b'CONNECT realpython.com:443 HTTP/1.1\r\nUser-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:92.0) Gecko/20100101 Firefox/92.0\r\nProxy-Connection: keep-alive\r\nConnection: keep-alive\r\nHost: realpython.com:443\r\n\r\n'


In [98]:
c = ConnectionData(global_list[0])

In [99]:
c.print_data()

Hostname: realpython.com
Port: 443
User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:92.0) Gecko/20100101 Firefox/92.0
Protocol: HTTP/1.1


### Connecting to a free proxy, using the socket module(example):

In [None]:
ip = '132.145.88.184'  # random free proxy, obtained from free proxy list websites
port = 3128
# connecting to python docs
CONNECT = 'CONNECT docs.python.org:443 HTTP/1.1\r\nUser-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:92.0) Gecko/20100101 Firefox/92.0\r\nProxy-Connection: keep-alive\r\nConnection: keep-alive\r\nHost: docs.python.org:443\r\n\r\n'
request_header = b'\x16\x03\x01\x02\x00\x01\x00\x01\xfc\x03\x03\x9f\xb8P\xf9\xd4\x94\x86\n\xd3\xda\xae\xf9\xecn\xb2E\xb3\x1a\x1e^h\x96\x83\xff:c\x92\xd3h8\xe6\xd4 \xf9\xb1\x01\xfa\xceb\xe8\xf2\xc5\x9fMF2\xa0R\x97Jd\x82\xe0H\xdea`\x9a\xfd\x8fb\xca\x02\xe8\xbc\x00$\x13\x01\x13\x03\x13\x02\xc0+\xc0/\xcc\xa9\xcc\xa8\xc0,\xc00\xc0\n\xc0\t\xc0\x13\xc0\x14\x00\x9c\x00\x9d\x00/\x005\x00\n\x01\x00\x01\x8f\x00\x00\x00\x14\x00\x12\x00\x00\x0fdocs.python.org\x00\x17\x00\x00\xff\x01\x00\x01\x00\x00\n\x00\x0e\x00\x0c\x00\x1d\x00\x17\x00\x18\x00\x19\x01\x00\x01\x01\x00\x0b\x00\x02\x01\x00\x00\x10\x00\x0e\x00\x0c\x02h2\x08http/1.1\x00\x05\x00\x05\x01\x00\x00\x00\x00\x00"\x00\n\x00\x08\x04\x03\x05\x03\x06\x03\x02\x03\x003\x00k\x00i\x00\x1d\x00 \x9f-.\xf9:G\xa3\xb6\x97\x9c_X\xb3\x0f\xf4&Y\x15\x80[=\xa8C\xe4\xcd\xf0\x92\xb3j\xa2\xaf2\x00\x17\x00A\x04\r]\xab?\t\x80(\x8c\xff\xeb^eT\xd7\xbd\xa2\x92\x1f[+\xb7\x93\x1d\xee1\x02S3\xeeD-AD\xd5\xbb\xd8\xfb\xbd\x8ct|E\x97\x12_\xe2\xc1\x8d\xaf\x1b\xb4\x0f\xf6\xae\xf7>\xd3\xad\x90\x16\x0c^\xdaM\x00+\x00\x05\x04\x03\x04\x03\x03\x00\r\x00\x18\x00\x16\x04\x03\x05\x03\x06\x03\x08\x04\x08\x05\x08\x06\x04\x01\x05\x01\x06\x01\x02\x03\x02\x01\x00\x1c\x00\x02@\x01\x00\x15\x00\x8f\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'

sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.connect((ip, port))
print(sock)
try:
    sock.send(CONNECT.encode())

    while True:
        # Receives the reply from the server
        reply = sock.recv(BUFFER_SIZE)
        if len(reply) > 0:
            # Forwards the reply back to the client
            print(reply)
            sock.send(request_header)
        else:
            break
    sock.shutdown(socket.SHUT_RDWR)
    sock.close()
except socket.error as e:
    print(e)
    traceback.print_exc()
    sock.close()
    conn.close()
except Exception as e:
    print('[!] Unexpected error: ', e)
    sock.close()
    conn.close()

# How a HTTPS connection works with the proxy, in steps:
#### *Additional information how proxy handles `keep-alive` connections down [here](#how-to-handle-keep-alivehttps-connections)*
---

1. #### A server socket is created and bound to an address.
2. #### The server socket listens for upcoming connections
3. #### Browser connects to the server socket
4. #### Upon accepting the connection with the browser, a new socket object is created which is used to recieve and send data on the connection (recieve requests from the browser, and after send responses to the browser)
5. #### The new socket `client_conn` will then recieve data from the browser.
6. #### The browser sends the HTTP CONNECT method, asking the proxy to open a tunnel to the desired destination 
7. #### CONNECT method is parsed.
8. #### A new socket `remote_conn` is created. From now on, this socket will be used to recieve data from, and send data to the remote host
9. #### The new socket connects(TCP) to the remote host, specified in the CONNECT method. 
10. #### If the connection was sucessfull `client_conn`  sends a `200 OK` response to the browser, telling it that the tunnel is open.
11. #### The browser sends a 'Client Hello message' to the remote host
12. #### Remote host responds with a 'Server hello message'.
13. #### The TLS handsake proceeds and is resolved... To see what is going exactly on click [here](https://www.cloudflare.com/learning/ssl/what-happens-in-a-tls-handshake/).
14. #### The proxy keeps proxying the data between sockets. Example:<br> Mozilla(browser) --> `client_conn` --> `remote_conn` --> www.google.com --> `remote_conn` --> `client_conn` --> Mozilla
---
#### This is different from a 'Man in the middle(mitm) proxy' as this proxy is **only** tunneling the data. As the data is encrypted, it does not know what is displayed in the browser. It only knows who to connect to. The data is decrypted by the browser.
#### In order to decrypt the data communicated between browser and the web server, the proxy would need to have its own certificates which the browser needs to trust. In that case the proxy would recieve data from the browser, decrypt it, see what's inside, encrypt it again and send to the server.

# How a HTTPS connection works with the proxy:

## 1. Creating a server socket:

#### <b>First a server socket is created using the context manager.</b></h4>
`with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as server:
    server.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)`
<h4>The server socket listens for connections on a specific port and accepts them. This socket has its address which the browser uses for connecting to the proxy</h4>

### **Accepting connections and the client socket:**
#### We need to accept upcoming connections. For example, that would be a browser connecting to our server. <br> `conn, addr = server.accept()` accepts the connection, where `conn` is a newly created socket object responsible for all of the communication with the browser, both recieving from, and sending data to the browser
#### As the browser will intialize a new connection to the server each time it wants to connect to a new remote host, we need to have that code inside a `while True` loop which is the main loop that keeps the proxy server running. 

## Differentiating between HTTP and HTTPS requests
#### In order to determine which kind of request the browser wants to make, we parse the first stream of data that the browser sends to the `conn` socket. 
* #### In the case of an HTTPS request the browser will always send the CONNECT method. Which is used to tell the browser to initialize a TCP tunnel to the specified server. The proxy will then just relay encrypted data between the browser and the remote server. The encryption will be handled by the browser.
* #### If an HTTP request is made, the CONNECT method is not used, since the data is not encrypted proxy can see GET and POST methods. In this case, the proxy just needs to parse the request data and determine to which server it should connect to

## 2. Connecting to the remote web server and threading
#### Each new connection is started in a new thread: 
`if data[:7] == b'CONNECT':
    start_new_thread(https_thread_wrapper, (conn, data, 'https'))
elif data[:3] == b'GET' or data[:4] == b'POST':
    start_new_thread(http_thread_wrapper, (conn, data, 'http'))`
#### There are two classes `HttpsConnection` and `HttpsConnection` used to initalize connections to the remote server. The main difference between them is the way in which they handle the communication between the sockets. Each have  `__enter__` and `__exit__` methods to make sure the connection is properly closed off after the communication is complete. Because of that, the connection can now be initalized with the context manager:
`with HttpsConnection(conn, connection_data) as https_conn:
    https_conn.serve()` or <br>
`with HttpConnection(conn, connection_data) as http_conn:
    http_conn.resolve()` 
#### Each connection is inside a wrapper function for threading. So each connection will be started in a new thread. 



## 3. Connection classes and its methods.
### `__enter__` method for HTTPS:
```py
def __enter__(self):
    self.remote_sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    self.remote_sock.setblocking(0)
    self.remote_sock.settimeout(7)
    self.browser_conn.setblocking(0)
    self.remote_sock.connect((self.connection_data.hostname, self.connection_data.port))
    self.browser_conn.send(b'HTTP/1.1 200 Connection established\r\n\r\n')
    return self
```
#### First a new socket is created. This socket will be used to communicate with the remote web server. After its creation, we need to set it to non-blocking mode. Otherwise the socket would keep waiting for the data when it should not recieve it and ultimatively time out and hang the connection. A custom timeout is set and the socket is connected to the remote web server.
#### If the connection was successfull, the browser needs to be informed with a `200 OK` header. That tells the browser that the connection was eastablished and that it can start sending data to the remote server
### `__exit__` method: 
#### The exit method will close the open sockets:<br> `self.browser_conn.close()` for the browser-side socket, and <br>`self.remote_sock.close()` for the remote-server-side socket.

### **Difference between the Https Connection and the Http Connection:**
#### The main difference between these two connections is the way they communicate with the remote server. 
#### For HTTP, it is easy to resolve the connection by just recieving the request headers from the browser and sending them to the remote web server. After that - recieve all data from the remote server and send it to the browser in one go -> the request is complete:
```py
self.remote_sock.send(self.connection_data.headers)
while True:
    try:
        data = self.remote_sock.recv(self.buffer_size)
    except socket.error:
        data = ''
    if data:
        self.browser_conn.send(data)
    else:
        break
```
#### For HTTPS it's a bit more complicated, the communication is done in [multiple steps](https://www.cloudflare.com/learning/ssl/what-happens-in-a-tls-handshake). Because of that we can't just recieve and send the data in just one go. Instead, we check if each socket has something to transmit, in which case we send it to the other one, all inside a while loop. Otherwise if both sockets have nothing to say(browser and remote server) we break the loop and end the connection.
#### This is done in the `HttpsConnection.serve()` method

# **Proxying the Browser Connection:**
<p>
<ul>
    <li><h3> The browser sends the CONNECT header</h3></li>
    <li><h3> The proxy parses the CONNECT header and determines to which host is should connect to</h3></li>
    <li><h3> If the proxy connected successfully to the remote host, it replis with an '200 OK' message to the browser</h3></li>
    <li><h3>Once the browser knows that the connection has been made, it sends the 'Client Hello' message</h3></li>
    <li><h3>The proxy forwards the 'Client Hello' to the remote host, and the remote host replies with the 'Server Hello' message</h3></li>
    <li><h3>This proxy does not need to interpret furhter messages, it just continues to tunnel the data</h3></li>
</ul>
</p>

In [1]:
# the proxy can determine if it's the client hello message by inspecting the first bits of the message
# if certain bits are contained, then the message is 'client hello'

In [53]:
# This is the whole browsers'client hello message'
client_hello = b'''\x16\x03\x01\x02\x00\x01\x00\x01\xfc\x03\x03T@3\xb8T\x9fE\xbb \x9a,\xeb\x9cn\xdd\x85KXd\x82\x18\xea)\xf3bO\xf9\\W\xf7\xd4} \xae\xee!\xd9-\xcc\x11\xcc\xec\x86\xb2\x9d9\xa7\x84\tT\xac\x10\x92\xaa\x94\xbd\xb0\x12aZ\xb6\r\x04\xd8U\x00$\x13\x01\x13\x03\x13\x02\xc0+\xc0/\xcc\xa9\xcc\xa8\xc0,\xc00\xc0\n\xc0\t\xc0\x13\xc0\x14\x00\x9c\x00\x9d\x00/\x005\x00\n\x01\x00\x01\x8f\x00\x00\x00\x19\x00\x17\x00\x00\x14trio.discourse.group\x00\x17\x00\x00\xff\x01\x00\x01\x00\x00\n\x00\x0e\x00\x0c\x00\x1d\x00\x17\x00\x18\x00\x19\x01\x00\x01\x01\x00\x0b\x00\x02\x01\x00\x00#\x00\x00\x00\x10\x00\x0e\x00\x0c\x02h2\x08http/1.1\x00\x05\x00\x05\x01\x00\x00\x00\x00\x00"\x00\n\x00\x08\x04\x03\x05\x03\x06\x03\x02\x03\x003\x00k\x00i\x00\x1d\x00 4\xd6\xd2\x13\xfc\xf8\xc0\xd5\xa2\xc6\x8e\x18\xfa\xf8\xff\xe9\x0c\x0e\x8e\xa4\xcepDm(\x02\xe4!\x91\xc2\xc2x\x00\x17\x00A\x04x\xd7I\x15\x8b\xf1\x07\x1dz\xe1+h1y\x183\x04c\x85XPP@\xee\x9e\xef\x91\x03C\x158\x07\x9ei\xd6\xd8K\xaf\x90\x99M\xf1\xbb\x19\xdb\xc0\xdfQb\xfc\xe1\x81+\x9b%L!\xf7\\Fx\x84\x93(\x00+\x00\x05\x04\x03\x04\x03\x03\x00\r\x00\x18\x00\x16\x04\x03\x05\x03\x06\x03\x08\x04\x08\x05\x08\x06\x04\x01\x05\x01\x06\x01\x02\x03\x02\x01\x00-\x00\x02\x01\x01\x00\x1c\x00\x02@\x01\x00\x15\x00\x80\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'''

In [17]:
# This is the header containing the data about the request. 
# \x01\xfc\ indicate the length of the message!
len('\x16\x03\x01\x02\x00\x01\x00\x01\xfc\x03\x03T@3\xb8T\x9fE\xbb')

19

In [66]:
client_hello[0] == b'\x16'  # is SSL handshake
client_hello[5] == b'\x01'  # is Client hello 
client_hello[1]  #  version
length = client_hello[3:5]

In [98]:
hex_ = ''
for i in length:
    hex_ += '0'+str(i)

print(hex_)

0200


In [100]:
int(hex_, 16)

512

In [92]:
hex(512)

'0x200'

In [None]:
def handshake():
    record_header = self.browser_conn.recv(5)
    if client_hello[0] == b'\x16' and client_hello[5] == b'\x01':
        print("Client hello message recieved")
    else:
        print("Not a client hello message")
        
    version = versions[client_hello[1]]
    # length of the 'Client Hello' message
    length_hex = client_hello[3:5]
    hex_ = ''
    for i in length:
        hex_ += '0'+str(i)
    length = (hex_, 16)  # convert from hexadecimal to decimal

In [9]:
g = '\x14\x03\x03\x00\x01\x01\x17\x03\x03\x005!\xbd\x01\xc6\xd5\xbf?\\K\xee\x80\xa6\xa2,\xec\x7f\xb02vhN2\xbf\xfc\x89\xac\xe9\x87\xa1\x15\xa5\xda\x0e\x7fH_\x03\xe6\xc7\x1b\xbe\xf0\xef\x9a\xd3R\xcf\x07UL\x97\x11\x08'
len(g)


64

In [None]:
# client hello
# 66 server
# server hello
# 66 client
# server something
# 66 client
# server something
# client change cipher
# client
# client
# client
# server


## First we need to create a server socket 

In [None]:
with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as server:
    server.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
    server.bind((LOCALHOST, SERVER_PORT))
    print(f"Server started on {'LOCALHOST'}:{SERVER_PORT}.")
    server.listen(MAX_CONNS)
    print("Listening for connections...")
    # conn recieves the data(request headers) from the browser, it also sends the data(ex. remote host's response) to the browser
    while True:
        try:
            conn, addr = server.accept()
            print(f"Accepted connection from {addr[0]}:{addr[1]}")
            data = conn.recv(BUFFER_SIZE)  # browser request headers
            if not data:
                conn.close()
            print('\n', data, '\n')
            if data[:7] == b'CONNECT':
                # https request
                start_new_thread(https_thread_wrapper, (conn, data, 'https'))
            elif data[:3] == b'GET' or data[:4] == b'POST':
                # http request
                start_new_thread(http_thread_wrapper, (conn, data, 'http'))
        except KeyboardInterrupt:
            print("Proxy shutting down...")
            server.close()

## How does a 'socket.timeout()' work:
### Useful link: https://medium.com/pipedrive-engineering/socket-timeout-an-important-but-not-simple-issue-with-python-4bb3c58386b4
### In this case we are setting the timeout for the 'socket.recv' function. Normally, while waiting to recieve data from the server, the socket will hang for a long time if there is no data to recieve. Which makes the proxy take too long to respond to the browser and ultimately fail. First socket blocking needs to be turned off  `socket.setblocking(0)`  so that the socket won't be blocked while waiting for the connection to complete. If there is no data to recieve, the socket will raise a'socker.error' error. The timeout indicates the maximum amount of time the socket will wait for the `recv` function. 

### Last revisit

In [6]:
def parse_https(data):
	"""
	Parses the 'HTTP CONNECT' tunnel method(HTTPS)
	"""
	headers = data.decode()
	split = headers.split('\r\n')
	
	host_re = r'Host:\s([a-z0-9\.\-]+):(\d+)'
	remote_host = re.search(host_re, headers).group(1)
	port = int(re.search(host_re, headers).group(2))
	user_agent = split[1].split('User-Agent: ')[-1]
	protocol = split[0].split(' ')[-1]
	print("hostname:", remote_host)
	print("port:", port)
	print("user agent:", user_agent)
	print("protocol", protocol)

In [24]:
# Example
print(data)
print('_'*50)
parse_https(data)

b'CONNECT incoming.telemetry.mozilla.org:443 HTTP/1.1\r\nUser-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/110.0\r\nProxy-Connection: keep-alive\r\nConnection: keep-alive\r\nHost: incoming.telemetry.mozilla.org:443\r\n\r\n'
__________________________________________________
hostname: incoming.telemetry.mozilla.org
port: 443
user agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/110.0
protocol HTTP/1.1


In [6]:
# decode the data and split by line
# decoded = data.decode()
# split = decoded.split('\r\n')
# print(split)
# print()
request_dict = {}  # dictionary that stores the whole request data
test_req = b'CONNECT incoming.telemetry.mozilla.org:443 HTTP/1.1\r\nUser-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/110.0\r\nProxy-Connection: keep-alive\r\nConnection: keep-alive\r\nHost: incoming.telemetry.mozilla.org:443\r\n\r\n'
split = test_req.split(b'\r\n')
print(split)

host_re = rb'Host:\s([a-z0-9\.\-]+):(\d+)'
remote_server = re.search(host_re, test_req).group(1)
port = int(re.search(host_re, test_req).group(2))
request_dict['Remote-Server'] = remote_server
request_dict['Port'] = port

for line in split[1:]:
	if line == b'':
		break
	k, v = line.split(b': ')
	request_dict[k.decode()] = v

# show the parsed example request, as a dict
request_dict

[b'CONNECT incoming.telemetry.mozilla.org:443 HTTP/1.1', b'User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/110.0', b'Proxy-Connection: keep-alive', b'Connection: keep-alive', b'Host: incoming.telemetry.mozilla.org:443', b'', b'']


{'Remote-Server': b'incoming.telemetry.mozilla.org',
 'Port': 443,
 'User-Agent': b'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/110.0',
 'Proxy-Connection': b'keep-alive',
 'Connection': b'keep-alive',
 'Host': b'incoming.telemetry.mozilla.org:443'}

#### How it works
1. Server socket gets created
2. `While True` loop that is constantly running and accepting new connections
3. `HTTP` connections get immediately resolved, `HTTPS` connections are *persistent* and stay open

*NOTE for HTTPS: This is NOT a 'Man in the middle' proxy as the data sent between the browser and the remote webserver is not encrypted, a man in the middle proxy would require the proxy server to have its own certificates that would decrypt the browser data, read it, and then encrypt them again and send to the remote web server. The browser and the remote web server would also need to trust the ProxyServer's certificates. This proxy ONLY proxies the data, it DOES NOT read it.*
*However, since HTTP connections don't have encryption, the data going through the proxy over a HTTP connection can be easily seen*

### How to handle `keep-alive`/`HTTPS` connections:
1. Read the `CONNECT` browser request and determine if the connection should be `keep-alive`
2. Once confirmed, infom the browser of the succesfull connection to the proxy server and confirm that the connection will now be `keep-alive`:<br>
	`b'HTTP/1.1 200\r\nConnection: keep-alive\r\n\r\n'`
3. Call a separate function for handling `keep-alive` connections
4. Set a custom timer for how long the connection should be open (ex. 60sec) **CURRENTLY USED**<br>
5. \- Other option: <br> **5.1:** Handle with the `BrokenPipeError`, if this error occurs, the server closed the connection, so we need to except it and close the connection <br>
**5.2:** Handle it by examining the TCP packet - if the server wants to close the connection, it will send the `FIN` packet, in the TCP headers which indicates that it wants to close the connection. The program shold parse packets coming from the server and determine if they hold teh `FIN` flag set to `1`. If the flag is set to `1` we shold close the connection. This approach is harder and requires additional code to inspect and parse `keep-alive` requests hence I won't be using it, for now!