# Improving runtimes of avi-scraper functions using multithreading

I got the low down on possibly improving my runtimes for the HTML parser built to extract avi data. Gonna try some of that out using multithreading 

In [8]:
import numpy as np

In [16]:
%%html
<style>
td {
  font-size: 18px
}
 </style>

I was able to imrove the run time of "getAviData" by creating a multithread of the function responsible for requesting archive pages. Check out the time comparison between getAviData and getAviData_Optimized 

```py
test = ["test1","test2","test3","test4","test5"]
for x in test:
    start = int(time.time())

    # This is the work-horse that gets the data
    # Put function to be tested here:
    getAviData_Optimized(0,x)
    data = pd.read_csv("./"+x+"_zone_0_aviDanger.csv")
    df = pd.DataFrame(data = data)
    # Stop time when the correct number of elements are found.
    # 1043 at the time of this test
    while len(df) != 1043:
        data = pd.read_csv("./"+x+"_zone_0_aviDanger.csv")
        df = pd.DataFrame(data = data)
        
    finish = int(time.time())
    elapsed_time = (finish - start)/60
    print(x + " took {1:.2f} minutes".format(x, elapsed_time))
    ```

In [9]:
Not_optimized = np.array([21.95, 25.2, 25.52, 25.7, 25.8])
Optimized = np.array([1.82, 1.12, 1.38, 1.20, 1.33])

In [10]:
mean_not_opt = Not_optimized.mean()
mean_opt = Optimized.mean()
std_dev_mean_not_opt = Not_optimized.std()
std_dev_opt = Optimized.std()
print(mean_not_opt,mean_opt,std_dev_mean_not_opt,std_dev_opt)

24.834 1.37 1.4563735784475083 0.24314604664686612


## Results 

| Function type | Trial 1 | Trial 2 | Trial 3 | Trial 4 | Trial 5 | Mean  | Standard Deviation |
|---------------|---------|---------|---------|---------|---------|-------|--------------------|
| Not Optimized | 21.95   | 25.20   | 25.52   | 25.70   | 25.80   | 24.83 | 1.46               |
| Optimized     | 1.82    | 1.12    | 1.38    | 1.20    | 1.33    | 1.37  | 0.24               |
    


Using an optimized function improved the speed by about 18x.

Take a look at the **getAviData_Optimized** in the helpers.py folder to see how the function was optimized.