OpenEdge ABL Python Shell
Fetching latest commit…
Cannot retrieve the latest commit at this time.
This project contains Python scripts for studying Location Based Social Networks and disease simulation. Description of files: iofiles.tar : contains input files used by some scripts, and a few results * citydata/ - census_raw.csv = original population data - census.out; census.p = dictionary city -> population - city_coor.csv = original coordinates data - city_coor.out; city_coordinates.p = dictionary city -> (latitude,longitude) - parse_census.py = parser for population data - parse_citycoor.py = parser for coordinates data - states.p = dictionary state_name -> state_abbrv * google_data/ - all.csv = google flu trends data 2004-2012(May) - match7-365d-norm-labeled.csv = May 10, 2009 to May 2, 2010, normalized - match7-norm = same as above, no date label * out/ - city_list.p = list of cities - coordinates.p = dictionary city -> (latitude,longitude) - locations.p = dictionary city -> list of user_names - school_contacts.p = list of number of contacts per day (from school contacts) - contact_dist.out = list of number of contacts per day (from gowalla data) - trans_prob.csv = transition probability matrix * results/ * gowalla/ - avg-11.csv = 52-week results from gowalla network - avg-11-norm.csv = 52-week results, normalized - avg-11-norm-labeled.csv = 52-week results with date and states labels - matrix-*.out = 365-day simulation results - scores11 = sorted distances for every 52-week subset against google flu <results-file>-<google_flu_starting_week_number>:<distance_score> - statescores-11 = state-by-state distance against google flu on selected week (this is for starting week 281) <state_index> <state_abbrv>:<distance_score> * perm/ - (same naming convention as above but results for permuted transition probabilities) * rand/ - (similar files but results for randomized transition probabilities) - matrix0-1-randomT.csv = randomized transition probability matrix run_sim.sh : executes faster_sim on grid using runCmd - outputs the resulting matrices in a results/ directory faster_sim.py : modified disease simulation using vectors for disease compartments - outputs [matrix$] with the incidences for each state at each time step (365 timesteps x #states) #python faster_sim.py [n] [prob] out/school_contacts.p citydata/states.p citydata/census.p out/trans_prob.csv citydata/city_list.p [google_data/all.csv] [matrix$] util.py : contains functions used in simulation data_analysis/compare_avg.py : computes the 52-week tally for each state - outputs 52-week tallies and normalized values #python compute_avg.py matrix$count n avg-matrix-$count data_analysis/compare_google.py : computes the euclidean distance between the resulting normalized matrix and all possible google matches - outputs list of distances, the index corresponds to the week number from the google data #python faster_sim.py [n] [prob] out/school_contacts.p citydata/states.p citydata/census.p out/trans_prob.csv citydata/city_list.p [google_data/all.csv] [matrix$] ----------------------------------------------------------------------------- This section needs to be updated: split_data.py : parses raw gowalla data to extract desired fields #python split_data.py [gowalla_raw] [output] fix_locations.py : fixes location info from citydata (census/coord) - outputs not_found.out for unfound cities #python fix_locations.py location_data.raw census.p city_coordinates.p fixed_locations.in fix_time.py : fixes timestamps to standard Eastern timezone #python fix_time.py fixed_locations.in timezones.in fixed_time.in citydata/ : directory contains raw data for census, city coordinates, parser for raw files and pickled data **TODO: need to update raw census data, check mismatch with city_coord datadist_analysis.py : extracts info about checkins - time_diff, distance, year09, year10, freq #python datadist_analysis.py location_data.in data_info.out distance_time.py checkins_time.py calc_speed.py calc_timespeed.py - extracts info from data_info.out - time_diff, distance, total_checkins, total_time, speed chartmaker/ : directory contains scripts for producing charts remove_freq.py : remove freq/problem users #python remove_freq.py [location_data] freq.out fixed_freq.in removed.out loc_stats.py : extracts city based info - locations.p: maps city to list of users that visited the city - coordinates.p : maps city to avg (lat, long) coordinates from all locations in that city - location_stats.out : city/state user statistics #python loc_stats.py location_data.in location_stats.out user_list.py : extracts user checkin history - user_checkins.out : maps user to list of (city, date) checkins sorted chronologically - finds and separates Austin users #python user_list.py location_data.in user_checkins.out austin.out build_network.py : constructs location-based network from user checkin history - uses LocationGraph from locations.py - parses user_checkins.out to rebuild user checkin history - for each user, traverse history, add edge from one city to the next - for every city node created, set coordinates from coordinates.p - for each Austin walker, set edge weights as .5 - add epsilon (5) weight between all pairs of city nodes - save network as gowalla_net - outputs network info #python build_network.py user_checkins.out austin.out coordinates.p gowalla_net locations.py : LocationGraph data structure uses networkx, directed graph simulations.py : executes disease simulation #python simulation.py [time_steps] [init_n] [n] [prob] locations.p census.p gowalla_net sim.out