Skip to content
This repository has been archived by the owner on Aug 25, 2024. It is now read-only.

source: csv: Add update functionality #17

Merged
merged 4 commits into from
Mar 12, 2019

Conversation

sudharsana-kjl
Copy link
Contributor

@sudharsana-kjl sudharsana-kjl commented Mar 11, 2019

Removed

  • Removed the _close method from CSVSource in dffml/dffml/source/csvfile.py

Changed

  • Updated the dump_fd method in CSVSource in dffml/dffml/source/csvfile.py

Added

  • Test for dump_fd method in CSVSource at dffml/tests/source/test_csv.py

@sudharsana-kjl sudharsana-kjl changed the title source: csv: Add update funcationality source: csv: Add update functionality Mar 11, 2019
@johnandersen777
Copy link

Sweet!! Can you add a test case for this too please?

@johnandersen777
Copy link

johnandersen777 commented Mar 11, 2019

For the testcase just copy https://github.com/intel/dffml/blob/master/tests/source/test_file.py into test_csv.py, pass it a repo to update. Write it out to a tempfile and check that the headers are there.

You only added close, so don't worry about testing the other methods unless you want to, that can me a separate PR if so

@johnandersen777
Copy link

I ran a quick test to verify and looks like it works! You're test should make a CSVSource with the tempfile module. Call update( some repo with prediction set ), close the source (so you';; have two with blocks. The first one to update, then do another with block which will trigger open, and verify the prediction and confidence are still set on that repo.

$ dffml predict all \
  -model dnn \
  -sources csv=iris_test.csv \
  -classifications 0 1 2 \
  -features \
    def:SepalLength:float:1 \
    def:SepalWidth:float:1 \
    def:PetalLength:float:1 \
    def:PetalWidth:float:1 \
  -caching \
  -update

...

[
    {
        "classification": "1",
        "extra": {},
        "features": {
            "PetalLength": 4.2,
            "PetalWidth": 1.5,
            "SepalLength": 5.9,
            "SepalWidth": 3.0
        },
        "last_updated": "2019-03-11T09:11:25Z",
        "prediction": {
            "classification": "1",
            "confidence": 1.0
        },
        "src_url": "0"
    },

...

    {
        "classification": "1",
        "extra": {},
        "features": {
            "PetalLength": 4.3,
            "PetalWidth": 1.3,
            "SepalLength": 6.4,
            "SepalWidth": 2.9
        },
        "last_updated": "2019-03-11T09:11:25Z",
        "prediction": {
            "classification": "1",
            "confidence": 1.0
        },
        "src_url": "29"
    }
]
$ dffml list repos -log debug -sources csv=iris_test.csv
DEBUG:dffml.util.cli:Setting <dffml.cli.ListRepos object at 0x7f7b223bcf60>.log = 20
DEBUG:dffml.util.cli:Setting <dffml.cli.ListRepos object at 0x7f7b223bcf60>.sources = [CSVSource('iris_test.csv')]
DEBUG:dffml.source.csv:CSVSource('iris_test.csv') loaded 30 records
Undetermined (0.0% confidence) 0 classified as: 1
SepalLength                   5.9
SepalWidth                    3.0
PetalLength                   4.2
PetalWidth                    1.5
prediction                    1
confidence                    1.0

...

Undetermined (0.0% confidence) 29 classified as: 1
SepalLength                   6.4
SepalWidth                    2.9
PetalLength                   4.3
PetalWidth                    1.3
prediction                    1
confidence                    1.0
DEBUG:dffml.source.csv:CSVSource('iris_test.csv') saved 30 records

As seen from this output, and you'll find when you write the testcase, that you'll need to change the load_fd of CSVFile to grab the new headers you added.

if not data.get('classification') is None:
classification = data['classification']
del data['classification']
repo = Repo(str(i), data={'features': data,
'classification': str(classification)})

@johnandersen777 johnandersen777 added the enhancement New feature or request label Mar 11, 2019
@codecov-io
Copy link

codecov-io commented Mar 12, 2019

Codecov Report

❗ No coverage uploaded for pull request base (master@bf86d24). Click here to learn what that means.
The diff coverage is 80.43%.

Impacted file tree graph

@@            Coverage Diff            @@
##             master      #17   +/-   ##
=========================================
  Coverage          ?   93.46%           
=========================================
  Files             ?       30           
  Lines             ?     1637           
  Branches          ?      155           
=========================================
  Hits              ?     1530           
  Misses            ?       95           
  Partials          ?       12
Impacted Files Coverage Δ
tests/source/test_csv.py 100% <100%> (ø)
dffml/source/csvfile.py 54.71% <62.5%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bf86d24...19c970a. Read the comment docs.

@johnandersen777 johnandersen777 merged commit ba31ae7 into intel:master Mar 12, 2019
@johnandersen777
Copy link

Whooo hooooo!!! Thanks @sudharsana-kjl!!! You're the first contributor 🎉 🎉 🎉 🎉 🎉 🎉 🎉

@sudharsana-kjl
Copy link
Contributor Author

Thank you :)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants