Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mobilkit Tutorial #6

Closed
levisweetbreu opened this issue Jun 14, 2023 · 1 comment
Closed

Mobilkit Tutorial #6

levisweetbreu opened this issue Jun 14, 2023 · 1 comment

Comments

@levisweetbreu
Copy link
Contributor

@takayabe0505 and @ubi15 I have reviewed mobilkit_tutorial.ipynb to assess the functionality of the package and I have a few thoughts:

  1. When merging the synthetic data with the empirical data, the merge function does not work as shown because the uid column is acting as the dataframe's index by default, and the nunique function does not work on the index column. I was able to bypass this by resetting the index before using groupby and agg, and the head of the dataframe changed slightly but the following plot did not change meaningfully.
  2. Also when merging the synthetic data with the empirical the documentation states that it is data for Mexico, but it should be for Beijing, correct?
  3. Is the Mexican data used for the remainder of the tutorial available for download? It looks like the CSV used is produced in one of the example notebooks (mobilkit_population_per_area.ipynb) but when I follow the trail I cannot find the original input data ("/data/dataHFLB/mexico/20171*/part-*.csv.gz"). If the input data is available, the tutorial should tell users how to create the CSV, and if the input data is not available for download, the tutorial should state that.

openjournals/joss-reviews#5201

mindearth pushed a commit that referenced this issue Jun 20, 2023
@ubi15
Copy link
Collaborator

ubi15 commented Jun 20, 2023

@levisweetbreu thanks a lot for your feedback on this!
Below our actions:

  1. in 8d65cf9 we fixed that issue in the mobilkit_tutorial.ipynb, we also moved to newer Dask and Pandas versions, which required a fix in the temporal.py submodule.
  2. We corrected this unclear referencing in 7cfb3a8;
  3. We added in 8d65cf9 a clear statement to say that these data are not included in the repo to preserve users' privacy.

Let us know if you need further clarifications.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants