No description, website, or topics provided.
OpenEdge ABL Python Shell
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
chartmaker
citydata
data_analysis
google_data
out
README
build_network.py
checkins_time.py
colorize_svg.py
contact_dist.py
datadist_analysis.py
distance_time.py
faster_sim.py
fix_locations.py
fix_time.py
gowalla00
loc_stats.py
location_data.in
locations.py
permute_sim.py
print_maps.sh
random_sim.py
remove_freq.py
run_permsim.sh
run_randsim.sh
run_sim.sh
simulation.py
split_data.py
test_simulation.py
timezones.in
transprob_matrix.py
user_list.py
util.py

README

This project contains Python scripts for studying Location Based Social Networks and disease simulation.

Description of files:

iofiles.tar : contains input files used by some scripts, and a few results
  * citydata/
      - census_raw.csv = original population data
      - census.out; census.p = dictionary city -> population
      - city_coor.csv = original coordinates data
      - city_coor.out; city_coordinates.p = dictionary city -> (latitude,longitude)
      - parse_census.py = parser for population data
      - parse_citycoor.py = parser for coordinates data
      - states.p = dictionary state_name -> state_abbrv

  * google_data/
      - all.csv = google flu trends data 2004-2012(May)
      - match7-365d-norm-labeled.csv = May 10, 2009 to May 2, 2010, normalized
      - match7-norm = same as above, no date label

  * out/
      - city_list.p = list of cities
      - coordinates.p = dictionary city -> (latitude,longitude)
      - locations.p = dictionary city -> list of user_names
      - school_contacts.p = list of number of contacts per day (from school
        contacts)
      - contact_dist.out = list of number of contacts per day (from gowalla data)
      - trans_prob.csv = transition probability matrix

  * results/
      * gowalla/
          - avg-11.csv = 52-week results from gowalla network
          - avg-11-norm.csv = 52-week results, normalized
          - avg-11-norm-labeled.csv = 52-week results with date and states labels
          - matrix-*.out = 365-day simulation results
          - scores11 = sorted distances for every 52-week subset against google flu
              <results-file>-<google_flu_starting_week_number>:<distance_score>
          - statescores-11 = state-by-state distance against google flu on
            selected week (this is for starting week 281)
              <state_index> <state_abbrv>:<distance_score>
      * perm/
          - (same naming convention as above but results for permuted transition
            probabilities)
      * rand/
          - (similar files but results for randomized transition probabilities)
          - matrix0-1-randomT.csv = randomized transition probability matrix
 


run_sim.sh : executes faster_sim on grid using runCmd
    - outputs the resulting matrices in a results/ directory

faster_sim.py : modified disease simulation using vectors for disease compartments
    - outputs [matrix$] with the incidences for each state at each time step
      (365 timesteps x #states)
#python faster_sim.py [n] [prob] out/school_contacts.p citydata/states.p citydata/census.p out/trans_prob.csv citydata/city_list.p [google_data/all.csv] [matrix$]

util.py : contains functions used in simulation

data_analysis/compare_avg.py : computes the 52-week tally for each state
    - outputs 52-week tallies and normalized values
#python compute_avg.py matrix$count n avg-matrix-$count

data_analysis/compare_google.py : computes the euclidean distance between the
resulting normalized matrix and all possible google matches
    - outputs list of distances, the index corresponds to the week number from
      the google data
#python faster_sim.py [n] [prob] out/school_contacts.p citydata/states.p citydata/census.p out/trans_prob.csv citydata/city_list.p [google_data/all.csv] [matrix$]


-----------------------------------------------------------------------------
This section needs to be updated:

split_data.py : parses raw gowalla data to extract desired fields
#python split_data.py [gowalla_raw] [output]

fix_locations.py : fixes location info from citydata (census/coord)
    - outputs not_found.out for unfound cities
#python fix_locations.py location_data.raw census.p city_coordinates.p fixed_locations.in

fix_time.py : fixes timestamps to standard Eastern timezone
#python fix_time.py fixed_locations.in timezones.in fixed_time.in

citydata/ : directory contains raw data for census, city coordinates, parser
for raw files and pickled data
    **TODO: need to update raw census data, check mismatch with city_coord

datadist_analysis.py : extracts info about checkins
    - time_diff, distance, year09, year10, freq
#python datadist_analysis.py location_data.in data_info.out

distance_time.py
checkins_time.py
calc_speed.py
calc_timespeed.py
    - extracts info from data_info.out
    - time_diff, distance, total_checkins, total_time, speed

chartmaker/ : directory contains scripts for producing charts

remove_freq.py : remove freq/problem users
#python remove_freq.py [location_data] freq.out fixed_freq.in removed.out

loc_stats.py : extracts city based info
    - locations.p: maps city to list of users that visited the city
    - coordinates.p : maps city to avg (lat, long) coordinates from all locations
      in that city
    - location_stats.out : city/state user statistics
#python loc_stats.py location_data.in location_stats.out

user_list.py : extracts user checkin history
    - user_checkins.out : maps user to list of (city, date) checkins sorted
      chronologically
    - finds and separates Austin users
#python user_list.py location_data.in user_checkins.out austin.out

build_network.py : constructs location-based network from user checkin history
    - uses LocationGraph from locations.py
    - parses user_checkins.out to rebuild user checkin history
    - for each user, traverse history, add edge from one city to the next
    - for every city node created, set coordinates from coordinates.p
    - for each Austin walker, set edge weights as .5
    - add epsilon (5) weight between all pairs of city nodes
    - save network as gowalla_net
    - outputs network info
#python build_network.py user_checkins.out austin.out coordinates.p gowalla_net

locations.py : LocationGraph data structure uses networkx, directed graph

simulations.py : executes disease simulation
#python simulation.py [time_steps] [init_n] [n] [prob] locations.p census.p gowalla_net sim.out