# Graph Generator

> This notebook is used to make graphs for TSP.

In [1]:
import pandas as pd

Open pubs dataset.

In [2]:
df_full = pd.read_csv('../pubs_crawler/pubs.csv')
df_full.head()

Unnamed: 0,pub_name,region,locality,url,longitude,latitude,postcode,country
0,The Woburn Hotel in Woburn,Bedfordshire,Woburn,https://www.pubsgalore.co.uk/pubs/1355/,-0.618888,51.988484,MK17 9PX,England
1,Ye Olde Leathern Bottel in Wokingham,Berkshire,Wokingham,https://www.pubsgalore.co.uk/pubs/2002/,-0.862637,51.403176,RG41 4BY,England
2,The Five Bells in Tydd St. Mary,Cambridgeshire,Tydd St. Mary,https://www.pubsgalore.co.uk/pubs/3175/,0.139242,52.745956,PE13 5QH,England
3,Five Bells Hotel in Upwell,Cambridgeshire,Upwell,https://www.pubsgalore.co.uk/pubs/3174/,0.221527,52.601752,PE14 9AA,England
4,The Woolpack in Shefford,Bedfordshire,Shefford,https://www.pubsgalore.co.uk/pubs/1339/,-0.32757,52.03698,SG17 5JF,England


Select a random subset of pubs of a predermined size.

In [3]:
SAMPLE_SIZE = 1000

df_full.sample(SAMPLE_SIZE)

Unnamed: 0,pub_name,region,locality,url,longitude,latitude,postcode,country
49552,Lion Hotel in Llanbister,Powys,Llanbister,https://www.pubsgalore.co.uk/pubs/79399/,-3.311595,52.351470,LD1 6TN,Wales
8932,Robin Hood in Barrow-In-Furness,Cumbria,Barrow-In-Furness,https://www.pubsgalore.co.uk/pubs/5983/,-3.224721,54.113492,LA14 1DU,England
20134,The Stanley Gate in Ormskirk,Lancashire,Ormskirk,https://www.pubsgalore.co.uk/pubs/56491/,-2.841940,53.541340,L39 9EN,England
21321,Ma Kellys in Blackpool,Lancashire,Blackpool,https://www.pubsgalore.co.uk/pubs/18091/,-3.051262,53.820281,FY1 1LL,England
17892,The Plough Inn in Farnsfield,Nottinghamshire,Farnsfield,https://www.pubsgalore.co.uk/pubs/30761/,-1.030942,53.101987,NG22 8EA,England
...,...,...,...,...,...,...,...,...
15149,The Carpenters Arms in Fangfoss,North Yorkshire,Fangfoss,https://www.pubsgalore.co.uk/pubs/63067/,-0.834907,53.968419,YO41 5QG,England
13567,The Bull in Watton At Stone,Hertfordshire,Watton At Stone,https://www.pubsgalore.co.uk/pubs/14546/,-0.109436,51.856562,SG14 3SB,England
28728,Cherry Tree in Melton Mowbray,Leicestershire,Melton Mowbray,https://www.pubsgalore.co.uk/pubs/58951/,-0.895616,52.754321,LE13 0EW,England
3158,Trouble House in Tetbury,Gloucestershire,Tetbury,https://www.pubsgalore.co.uk/pubs/12853/,-2.126056,51.657299,GL8 8SG,England


Remove these without replacement.

In [4]:
n_subsets = 10 # number of subsets

assert(n_subsets * SAMPLE_SIZE < len(df_full)) # Make sure there is enough data to take samples without replacement

subsets = []

for i in range(n_subsets):
    df_temp = df_full.sample(SAMPLE_SIZE)
    subsets.append(df_temp)
    df_full.drop(df_temp.index, inplace=True)
    
print("Number of subsets:" + str(len(subsets)))

Number of subsets:10


Reset the index numbers.

In [5]:
for sub in subsets:
    # sub.set_index(pd.Index([i for i in range(1,len(sub)+1)]), inplace=True)
    sub.reset_index(inplace=True)
    
subsets[0].head()

Unnamed: 0,index,pub_name,region,locality,url,longitude,latitude,postcode,country
0,16947,The Havelock Tavern in W14,London (Greater),W14,https://www.pubsgalore.co.uk/pubs/24820/,-0.216122,51.499286,W14 0LS,England
1,37807,The White Hart in Shard End,West Midlands,Shard End,https://www.pubsgalore.co.uk/pubs/37952/,-1.761813,52.483136,B33 9UU,England
2,13302,The Three Moorhens in Hitchin,Hertfordshire,Hitchin,https://www.pubsgalore.co.uk/pubs/59390/,-0.27598,51.941321,SG4 9AJ,England
3,22108,The Slug & Lettuce in Manchester,Manchester (Greater),Manchester,https://www.pubsgalore.co.uk/pubs/55844/,-2.246072,53.479275,M2 5HD,England
4,42198,Bayview Bar in Methil,Fife,Methil,https://www.pubsgalore.co.uk/pubs/80112/,-3.013794,56.186123,KY8 3NA,Scotland


Define a function to create .tsp files to run with the Concorde.

In [6]:
def tsp_file(data, name='unnamed', DIR='.'):
    """
    Arguements:
    name -- Name of the TSP file.
    data -- dataframe containing longitude,latitude and indexed starting from 1
    
    Creates a <name>.tsp file in the .tsp format
    """
    
    file = open(DIR + "/" + name + ".tsp", "w+")
    
    file.write("NAME: " + name + "\n")
    file.write("TYPE: TSP\n")
    file.write("COMMENT: " + str(len(data)) + " pub locations in the UK\n")
    file.write("DIMENSION: " + str(len(data)) + "\n")
    file.write("EDGE_WEIGHT_TYPE: EUC_2D\n")
    file.write("NODE_COORD_SECTION\n")
    
    for i in range(len(data)):
        file.write("{} {:.6f} {:.6f}\n".format(data.index[i], data['longitude'].iloc[i], data['latitude'].iloc[i]))
    
    file.close()

Convert the subsets to .tsp files.

In [8]:
for sub in enumerate(subsets):
    tsp_file(sub[1], 'Graph'+str(sub[0]), DIR='./Graphs')