add python support for rdd apis #7

relud · 2018-10-15T23:02:53Z

No description provided.

relud · 2018-10-15T23:03:23Z

python/src/mozdata/mozdata.py

        ping = json.dumps({
-            k: v for k, v in kwargs.items() if v is not None
+            k: v for k, v in event.items() if v is not None


renamed kwargs to event to better match scala implementation

codecov-io · 2018-10-15T23:08:44Z

Codecov Report

Merging #7 into master will not change coverage.
The diff coverage is 100%.

@@          Coverage Diff          @@
##           master     #7   +/-   ##
=====================================
  Coverage     100%   100%           
=====================================
  Files           5      5           
  Lines         164    176   +12     
  Branches       17     18    +1     
=====================================
+ Hits          164    176   +12

Impacted Files	Coverage Δ
python/src/mozdata/mozdata.py	`100% <100%> (ø)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5fabe19...8a05a67. Read the comment docs.

acmiyaguchi · 2018-10-16T21:05:55Z

.circleci/Dockerfile.python

@@ -9,6 +9,8 @@ RUN dpkg -i /var/cache/apt/archives/*.deb
 RUN pip install pyspark
 COPY python/setup.py /app/python
 RUN mkdir -p src/mozdata && pip install .[dev] && pip uninstall -y mozdata
+# fix for https://github.com/spulec/moto/issues/1793
+RUN pip install 'boto3<1.8'


This isn't in the install_requires because it's only needed by tests?

acmiyaguchi · 2018-10-16T21:17:35Z

python/tests/conftest.py

+                "test": {
+                    "prefix": "test",
+                    "metadata_prefix": "test",
+                    "bucket": "%s"


Suggested change

"bucket": "%s"

"bucket": "{}"

acmiyaguchi · 2018-10-16T21:17:49Z

python/tests/conftest.py

+                    "bucket": "%s"
+                }
+            }
+        """ % bucket)


Suggested change

""" % bucket)

""".format(bucket))

Format strings are nicer? Although it doesn't really make a difference.

acmiyaguchi · 2018-10-16T21:32:32Z

python/tests/test_read_rdd.py

+
+def test_read_rdd(spark_fake, rdd):
+    # override Dataset.records to return the dataset because we don't need
+    # to test that .records() works and the mock.patch for boto3 breaks it


Even if we don't need to test records, it seems misleading to assert properties about the Dataset class instead of an actual RDD.

Is this the error you were running into?

botocore.exceptions.ClientError: An error occurred (InvalidAccessKeyId) when calling the ListObjects operation: The AWS Access Key Id you provided does not exist in our records.

i mean, that's one of them.

i also ran into a different exception related to pickling a non-matching class, as a result of combining mock patching and spark workers

That's the one I got from setting boto3<1.8 and adding a spark fixture. It seems like a lot of the prep-work in the conftest fixture will go to waste :/

python/tests/test_read_rdd.py

acmiyaguchi

The issue with boto3 looks troublesome, I've made a suggestion to rename the list_rdd test. Otherwise r+.

relud · 2018-10-17T17:50:52Z

manually ran circleci tests, passed in python 2.7 and 3.7

filed #8 to deal with improving read_rdd testing

relud requested a review from acmiyaguchi October 15, 2018 23:02

relud commented Oct 15, 2018

View reviewed changes

relud force-pushed the python_rdd branch 3 times, most recently from 6e6a592 to 8a05a67 Compare October 16, 2018 00:00

acmiyaguchi reviewed Oct 16, 2018

View reviewed changes

python/tests/test_read_rdd.py Outdated Show resolved Hide resolved

acmiyaguchi approved these changes Oct 16, 2018

View reviewed changes

add python support for rdd apis

73cdf04

relud force-pushed the python_rdd branch from ee10323 to 73cdf04 Compare October 17, 2018 17:46

relud merged commit 73cdf04 into master Oct 17, 2018

relud deleted the python_rdd branch October 17, 2018 17:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add python support for rdd apis #7

add python support for rdd apis #7

relud commented Oct 15, 2018

relud Oct 15, 2018 •

edited

Loading

codecov-io commented Oct 15, 2018 •

edited

Loading

acmiyaguchi Oct 16, 2018

relud Oct 16, 2018

acmiyaguchi Oct 16, 2018

acmiyaguchi Oct 16, 2018

relud Oct 16, 2018

acmiyaguchi Oct 16, 2018

acmiyaguchi Oct 16, 2018

acmiyaguchi Oct 16, 2018

relud Oct 16, 2018

relud Oct 16, 2018

acmiyaguchi Oct 16, 2018

acmiyaguchi left a comment •

edited

Loading

relud commented Oct 17, 2018

add python support for rdd apis #7

add python support for rdd apis #7

Conversation

relud commented Oct 15, 2018

relud Oct 15, 2018 • edited Loading

Choose a reason for hiding this comment

codecov-io commented Oct 15, 2018 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

acmiyaguchi left a comment • edited Loading

Choose a reason for hiding this comment

relud commented Oct 17, 2018

relud Oct 15, 2018 •

edited

Loading

codecov-io commented Oct 15, 2018 •

edited

Loading

acmiyaguchi left a comment •

edited

Loading