Skip to content
This repository has been archived by the owner on Feb 4, 2021. It is now read-only.

add python support for rdd apis #7

Merged
merged 1 commit into from
Oct 17, 2018
Merged

add python support for rdd apis #7

merged 1 commit into from
Oct 17, 2018

Conversation

relud
Copy link
Collaborator

@relud relud commented Oct 15, 2018

No description provided.

ping = json.dumps({
k: v for k, v in kwargs.items() if v is not None
k: v for k, v in event.items() if v is not None
Copy link
Collaborator Author

@relud relud Oct 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renamed kwargs to event to better match scala implementation

@codecov-io
Copy link

codecov-io commented Oct 15, 2018

Codecov Report

Merging #7 into master will not change coverage.
The diff coverage is 100%.

Impacted file tree graph

@@          Coverage Diff          @@
##           master     #7   +/-   ##
=====================================
  Coverage     100%   100%           
=====================================
  Files           5      5           
  Lines         164    176   +12     
  Branches       17     18    +1     
=====================================
+ Hits          164    176   +12
Impacted Files Coverage Δ
python/src/mozdata/mozdata.py 100% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5fabe19...8a05a67. Read the comment docs.

@relud relud force-pushed the python_rdd branch 3 times, most recently from 6e6a592 to 8a05a67 Compare October 16, 2018 00:00
@@ -9,6 +9,8 @@ RUN dpkg -i /var/cache/apt/archives/*.deb
RUN pip install pyspark
COPY python/setup.py /app/python
RUN mkdir -p src/mozdata && pip install .[dev] && pip uninstall -y mozdata
# fix for https://github.com/spulec/moto/issues/1793
RUN pip install 'boto3<1.8'

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't in the install_requires because it's only needed by tests?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

"test": {
"prefix": "test",
"metadata_prefix": "test",
"bucket": "%s"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"bucket": "%s"
"bucket": "{}"

"bucket": "%s"
}
}
""" % bucket)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
""" % bucket)
""".format(bucket))

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Format strings are nicer? Although it doesn't really make a difference.


def test_read_rdd(spark_fake, rdd):
# override Dataset.records to return the dataset because we don't need
# to test that .records() works and the mock.patch for boto3 breaks it

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if we don't need to test records, it seems misleading to assert properties about the Dataset class instead of an actual RDD.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the error you were running into?

botocore.exceptions.ClientError: An error occurred (InvalidAccessKeyId) when calling the ListObjects operation: The AWS Access Key Id you provided does not exist in our records.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i mean, that's one of them.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i also ran into a different exception related to pickling a non-matching class, as a result of combining mock patching and spark workers

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the one I got from setting boto3<1.8 and adding a spark fixture. It seems like a lot of the prep-work in the conftest fixture will go to waste :/

Copy link

@acmiyaguchi acmiyaguchi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue with boto3 looks troublesome, I've made a suggestion to rename the list_rdd test. Otherwise r+.

@relud relud merged commit 73cdf04 into master Oct 17, 2018
@relud relud deleted the python_rdd branch October 17, 2018 17:46
@relud
Copy link
Collaborator Author

relud commented Oct 17, 2018

manually ran circleci tests, passed in python 2.7 and 3.7

filed #8 to deal with improving read_rdd testing

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants