Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tremendous memory usage due to deep data structures #114

Open
stefanct opened this issue Jul 10, 2021 · 3 comments
Open

Tremendous memory usage due to deep data structures #114

stefanct opened this issue Jul 10, 2021 · 3 comments

Comments

@stefanct
Copy link

It is not a big surprise but getting the history of a big relation with many edits takes enormous amounts of memory. For example, api.RelationHistory(2771761) (https://www.openstreetmap.org/relation/2771761) takes over 1,5GB of memory. This is because there is no way to specify/request shallow copies and the data is not exactly stored densely.

Is there any workaround for that?

@matkoniecz
Copy link

Is there support in OSM API for getting specific revisions?

@diegocrzt
Copy link

diegocrzt commented Feb 15, 2022

Following the API documentation v0.6 Seems there is no filter or shallow option in history endpoint.

Even official openstreemap.org site timeouts trying to view the history for this in my tests.

But there is also an endpoint for version, and seems version is a positive integer growing sequentially, from 1.

Then a workaround could be

  • Once you know the RelationId e.g 2771761
  • Get the relation rel = api.GetRelation(2771761)
  • From there get the relation last version last_version = rel['version']
    • If you need to optimize storage here, you can change to a different data structure, move to disk, or just store the minimum info you require. Right now I can't think in a general solution.
  • You can consecutively get previous versions previous_version = last_version - 1, and apply your data structure optimizations. using api.GetRelationGet(2771761, RelationVersion=previous_version)
  • Stop where you want or when previous_version is 1 (inclusive)
import osmapi
import time

api = osmapi.OsmApi()
rel = api.RelationGet(2771761)
last_ver = rel['version']

super_nice_optimal_structure_for_my_specific_needs = [rel['version']]
for i in range(last_ver-1, 1, -1):
  print('Getting ' + str(i))
  rel_i = api.RelationGet(2771761,RelationVersion=i) # Error handling // retry as homework
  super_nice_optimal_structure_for_my_specific_needs.append(rel_i['version'])
  time.sleep(0.5) ;-)

print(super_nice_optimal_structure_for_my_specific_needs)

In quick experiment this was using at most 43292 KB (~43 MB) (of course I am just storing version) and took 8' 38.78''

I think this can be implemented per case in specific implementation but I am not sure if it worth to add this kind of access patterns in OsmAPI itself.

About the data structure, it could be a good idea to analyze better general alternatives but I think @metaodi could have a different opinion on this

@metaodi
Copy link
Owner

metaodi commented Feb 15, 2022

Yes it's certainly possible to implement this on top of OsmApi by requesting versions. So I don't think it's an urgent issue.

But I can see the appeal to have a new flag shallow=True on certain (all?) methods and then provide a mechanism to lazy load items when they are accessed. I already build something similar in another library.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants