Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encode _ as -- in metadata when using S3Store.write_doc_to_s3 #532

Merged
merged 9 commits into from
Jan 19, 2022

Conversation

mkhorton
Copy link
Member

@mkhorton mkhorton commented Jan 11, 2022

To resolve a reported bug from @acrutt and @rkingsbury with:

botocore.exceptions.ClientError: An error occurred (AccessDenied) when calling the PutObject operation: There were headers present in the request which were not signed

Fix proposed by @shreddd.

@codecov
Copy link

codecov bot commented Jan 11, 2022

Codecov Report

Merging #532 (2f7d807) into main (0d49268) will increase coverage by 0.02%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #532      +/-   ##
==========================================
+ Coverage   88.99%   89.02%   +0.02%     
==========================================
  Files          40       40              
  Lines        2736     2743       +7     
==========================================
+ Hits         2435     2442       +7     
  Misses        301      301              
Impacted Files Coverage Δ
src/maggma/stores/aws.py 88.38% <100.00%> (+0.42%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 93ee676...2f7d807. Read the comment docs.

s3_bucket.put_object(
Key=self.sub_dir + str(doc[self.key]),
Body=data,
Metadata={k: str(v) for k, v in search_doc.items()},
Metadata={k.replace('_', '--'): str(v) for k, v in search_doc.items()},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we do a single - instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will defer to your judgement here. I suggested the double -- with the logic that it would be easier to do the reverse operation, assuming that people don't use -- that often, but perhaps this best avoided since it is not guaranteed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm - I don't feel strongly about it, but I think -- symbols are a bit confusing. It should be ok with just simplifying to a _ -> - conversion for readability.

@lgtm-com
Copy link

lgtm-com bot commented Jan 12, 2022

This pull request introduces 1 alert when merging 0189c53 into 93ee676 - view on LGTM.com

new alerts:

  • 1 for First parameter of a method is not named 'self'

@lgtm-com
Copy link

lgtm-com bot commented Jan 14, 2022

This pull request introduces 1 alert when merging c5ca583 into 93ee676 - view on LGTM.com

new alerts:

  • 1 for First parameter of a method is not named 'self'

@mkhorton mkhorton merged commit 436835b into main Jan 19, 2022
@acrutt
Copy link
Contributor

acrutt commented Jan 25, 2022

One issue I ran into that might be appropriate to add to this PR is that there seems to be a missing requirement in maggma (or maybe atomate?) for boto3 in order to use a S3store. Here is the error I got.

RuntimeError Traceback (most recent call last)
/tmp/ipykernel_61460/520720615.py in
----> 1 t_id = mmdb.insert_task(
2 task_doc,
3 use_gridfs=ft.get("parse_dos", False)
4 or bool(ft.get("bandstructure_mode", False))
5 or ft.get("parse_chgcar", False) # deprecated

/global/u1/a/acrutt/git_mp/atomate/atomate/vasp/database.py in insert_task(self, task_doc, use_gridfs)
153 # upload the data to a particular location and store the reference to that location in the task database
154 for data_key, data_val in big_data_to_store.items():
--> 155 fs_di_, compression_type_ = self.insert_object(
156 use_gridfs=use_gridfs,
157 d=data_val,

/global/u1/a/acrutt/git_mp/atomate/atomate/vasp/database.py in insert_object(self, use_gridfs, *args, **kwargs)
213 """
214 if self._maggma_store_type is not None:
--> 215 return self.insert_maggma_store(*args, **kwargs)
216 elif use_gridfs:
217 return self.insert_gridfs(*args, **kwargs)

/global/u1/a/acrutt/git_mp/atomate/atomate/vasp/database.py in insert_maggma_store(self, d, collection, oid, task_id)
269 doc = {
270 "fs_id": oid,
--> 271 "maggma_store_type": self.get_store(collection).class.name,
272 "compression": compression_type,
273 "data": d,

/global/u1/a/acrutt/git_mp/atomate/atomate/utils/database.py in get_store(self, store_name)
253 return None
254 if self._maggma_store_type == "s3":
--> 255 self._maggma_stores[store_name] = self._get_s3_store(store_name)
256 # Additional stores can be implemented here
257 else:

/global/u1/a/acrutt/git_mp/atomate/atomate/utils/database.py in get_s3_store(self, store_name)
288 )
289
--> 290 store = S3Store(
291 index=index_store
,
292 sub_dir=f"{self.maggma_store_prefix}_{store_name}",

/global/u1/a/acrutt/git_mp/maggma/src/maggma/stores/aws.py in init(self, index, bucket, s3_profile, compress, endpoint_url, sub_dir, s3_workers, key, store_hash, searchable_fields, **kwargs)
68 """
69 if boto3 is None:
---> 70 raise RuntimeError("boto3 and botocore are required for S3Store")
71 self.index = index
72

RuntimeError: boto3 and botocore are required for S3Store

@mkhorton
Copy link
Member Author

Thanks @acrutt, we could add this. Currently with pip, you would have to pip install maggma[S3] to get the boto requirements.

@munrojm @shyamd should boto3 be a core requirement? i.e., have the expectation that people can always use S3Stores?

@mkhorton
Copy link
Member Author

I guess the related question is, has boto3 ever been a difficult dependency to install, or is it fairly reliable and lightweight? If the latter I would err towards including.

@munrojm
Copy link
Member

munrojm commented Jan 26, 2022

I haven't ever had any issues with it, so I am not sure. Do you have any opinion on this @shyamd?

@shyamd
Copy link
Contributor

shyamd commented Jan 26, 2022

None, we did it early on to make sure we didn't overcommit dependencies, but now that its pretty normal to use it makes sense to shift to a required dependency.

@munrojm munrojm deleted the s3store-fix branch January 27, 2022 05:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants