Revise plot_out_of_core_classification.py #12694

paragmoteria · 2018-11-28T18:20:19Z

In this program, Sir Eustache Diemert used first 1000 samples to measure accuracy.
I put my efforts to extend the same program, to separate Train & Test Datasets as per guideline mentioned in README.txt file listed in Reuters-21578 datasets as provided by the UCI ML repository. Test Datasets used to measure accuracy.

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

In this program, Sir Eustache Diemert used first 1000 samples to measure accuracy. I put my efforts to extend the same program, to separate Train & Test Datasets as per guideline mentioned in README.txt file listed in Reuters-21578 datasets as provided by the UCI ML repository. Test Datasets used to measure accuracy.

paragmoteria · 2018-11-28T18:25:22Z

Suggest me if required to enhance my skill

jnothman

This looks like a good idea. Comment on the standard train/test split in the docstring of stream_reuters_documents.

Then please apply PEP8 (with the flake8 tool for instance) to fix some cosmetic issues.

examples/applications/plot_out_of_core_classification.py

paragmoteria · 2018-11-29T05:40:35Z

Dear Joel, Greetings!! Thanks for your assistance. As per your guidance, I revise my code as per PEP8 with flake8. Thanks & Regards, Parag

…

On Thu, Nov 29, 2018 at 4:36 AM Joel Nothman ***@***.***> wrote: ***@***.**** commented on this pull request. This looks like a good idea. Comment on the standard train/test split in the docstring of stream_reuters_documents. Then please apply PEP8 (with the flake8 tool for instance) to fix some cosmetic issues. ------------------------------ In examples/applications/plot_out_of_core_classification.py <#12694 (comment)> : > tick = time.time() -X_test_text, y_test = get_minibatch(data_stream, 1000) +X_text_test, y_test = get_testData(data_stream_test, positive_class) please use underscores rather than camelCase — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#12694 (review)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ArUg0vc-hah4UwWry8GHtiWlIx1bRKeTks5uzxbXgaJpZM4Y4LPh> .

-- Regards, Parag

eamanu

look good

eamanu · 2018-11-29T12:55:17Z

examples/applications/plot_out_of_core_classification.py

+
+def get_minibatch(doc_iter_train, size, pos_class=positive_class):
+    """Extract a minibatch of examples, return a tuple X_text, y.
+


IMO would be great write the Parameters and returns in the format used on sklearn docs. I mean:

"""Extract a minibatch of examples Parameters --------------- ... Return --------- ..... """

jnothman

still cosmetic nitpicks for now.

examples/applications/plot_out_of_core_classification.py

jnothman · 2018-12-02T22:37:07Z

examples/applications/plot_out_of_core_classification.py

@@ -70,6 +72,9 @@ def __init__(self, encoding='latin-1'):

    def handle_starttag(self, tag, attrs):
        method = 'start_' + tag
+        for attr in attrs:
+            if attr[0] == 'lewissplit':
+                self.LEWisSplit = attr[1]


why this capitalisation?

As per guidance to split train / test datasets, "LEWISSPLIT" is attribute that achieve the same. So, my purpose to express this variable in this manner is, Learners are easily identify this attribute.

LEWisSplit is not a conventional attribute name. Is lewis_split appropriate?

Yes, it's appropriate

jnothman · 2018-12-05T20:49:39Z

examples/applications/plot_out_of_core_classification.py

@@ -70,6 +72,9 @@ def __init__(self, encoding='latin-1'):

    def handle_starttag(self, tag, attrs):
        method = 'start_' + tag
+        for attr in attrs:


Surely this only applies to a single tag name, not all, and can be handled in the appropriate handle_* method

jnothman · 2018-12-05T20:50:33Z

examples/applications/plot_out_of_core_classification.py


    """

-    DOWNLOAD_URL = ('http://archive.ics.uci.edu/ml/machine-learning-databases/'
+    download_url = ('http://archive.ics.uci.edu/ml/machine-learning-databases/'


Please don't change things like this

Yes, it's true. I must learn to improve my skill.
Thanks for your guidance.

jnothman · 2018-12-05T20:51:42Z

examples/applications/plot_out_of_core_classification.py

@@ -140,20 +147,22 @@ def end_d(self):
        self.topic_d = ""


-def stream_reuters_documents(data_path=None):
+def stream_reuters_documents(data_path=None, train_test="TRAIN"):


Perhaps rename train_test to subset

subset is appropriate, nice!

jnothman · 2018-12-05T20:52:58Z

examples/applications/plot_out_of_core_classification.py

+data_stream_train = stream_reuters_documents(train_test="TRAIN")
+
+# Test Datasets
+data_stream_test = stream_reuters_documents(train_test="TEST")


I've now realised that we are doing two passes through the steam which kind of defeats the purpose. Either the test set is a prefix or we collect it while passing it through the stream

No comment?

jnothman · 2018-12-17T10:02:46Z

examples/applications/plot_out_of_core_classification.py


+    X_TextTest, y_test = zip(*data_test)


please, camel case does not belong here.

Greetings!!

Your previous comment (First Comment) was, please use underscores rather than camelCase.

So, to preserve the equality, I do the same.

Thanking you.

yes, but you still have camel case here

jnothman

Would you like help from someone else to complete this? I like the idea of reusing standard train-test splits, but we need to maintain the code quality as well.

jnothman · 2019-01-08T01:33:42Z

examples/applications/plot_out_of_core_classification.py

 from sklearn.datasets import get_data_home
+from sklearn.externals.six.moves import html_parser


We no longer support Python 2, so please just use html.parser

jnothman · 2019-01-08T01:33:48Z

examples/applications/plot_out_of_core_classification.py

@@ -70,6 +72,9 @@ def __init__(self, encoding='latin-1'):

    def handle_starttag(self, tag, attrs):
        method = 'start_' + tag
+        for attr in attrs:


jnothman · 2019-01-08T01:36:21Z

examples/applications/plot_out_of_core_classification.py

+    if not len(data_train):
+        return np.asarray([], dtype=int), np.asarray([], dtype=int)
+
+    X_text_test, y_train = zip(*data_train)


I'm confused by the mix of the words "test" and "train" here

jnothman · 2019-01-08T01:36:30Z

examples/applications/plot_out_of_core_classification.py

 total_vect_time = 0.0

 # Main loop : iterate on mini-batches of examples
-for i, (X_train_text, y_train) in enumerate(minibatch_iterators):
+for i, (X_TrainText, y_train) in enumerate(minibatch_iterators):


please, no camel case.

paragmoteria · 2019-01-08T02:38:48Z

Dear Sir, Greetings of the day!! Thanks for your valuable guidance. Yes, I require help to deliver quality code. So, I can exploring more on quality code writing. Thanks, Parag

…

On 08-Jan-2019 7:08 AM, "Joel Nothman" ***@***.***> wrote: ***@***.**** commented on this pull request. Would you like help from someone else to complete this? I like the idea of reusing standard train-test splits, but we need to maintain the code quality as well. ------------------------------ In examples/applications/plot_out_of_core_classification.py <#12694 (comment)> : > from sklearn.datasets import get_data_home +from sklearn.externals.six.moves import html_parser We no longer support Python 2, so please just use html.parser ------------------------------ In examples/applications/plot_out_of_core_classification.py <#12694 (comment)> : > @@ -70,6 +72,9 @@ def __init__(self, encoding='latin-1'): def handle_starttag(self, tag, attrs): method = 'start_' + tag + for attr in attrs: Ping. ------------------------------ In examples/applications/plot_out_of_core_classification.py <#12694 (comment)> : > + +def get_minibatch(doc_iter_train, size, pos_class=positive_class): + """Extract a minibatch of examples, return a tuple X_text, y. + + Note: size is before excluding invalid docs with no topics assigned. + + """ + data_train = [( + u'{title}\n\n{body}'.format(**doc), pos_class in doc['topics']) + for doc in itertools.islice(doc_iter_train, size) + if doc['topics']] + + if not len(data_train): + return np.asarray([], dtype=int), np.asarray([], dtype=int) + + X_text_test, y_train = zip(*data_train) I'm confused by the mix of the words "test" and "train" here ------------------------------ In examples/applications/plot_out_of_core_classification.py <#12694 (comment)> : > total_vect_time = 0.0 # Main loop : iterate on mini-batches of examples -for i, (X_train_text, y_train) in enumerate(minibatch_iterators): +for i, (X_TrainText, y_train) in enumerate(minibatch_iterators): please, no camel case. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#12694 (review)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ArUg0hqIP3SKtIqNvtIzvO0shDCsdc_Sks5vA_aegaJpZM4Y4LPh> .

ogrisel · 2021-02-25T13:44:40Z

@paragmoteria could you please address the comments of the reviewers and push the required changes to the branch of your PR accordingly?

If you do not understand what is required please ask specific questions so that we can help you.

jnothman reviewed Nov 28, 2018

View reviewed changes

examples/applications/plot_out_of_core_classification.py Outdated Show resolved Hide resolved

Update plot_out_of_core_classification.py

b4c8b99

paragmoteria changed the title ~~Update plot_plot_out_of_core_classification.py~~ Revise plot_plot_out_of_core_classification.py Nov 29, 2018

eamanu approved these changes Nov 29, 2018

View reviewed changes

jnothman reviewed Dec 2, 2018

View reviewed changes

paragmoteria added 2 commits December 3, 2018 07:07

Update plot_out_of_core_classification.py

9c6ff0a

Update plot_out_of_core_classification.py

6ab4d4a

jnothman reviewed Dec 5, 2018

View reviewed changes

Update plot_out_of_core_classification.py

7750b8b

jnothman reviewed Dec 17, 2018

View reviewed changes

paragmoteria added 2 commits January 6, 2019 10:45

Update plot_out_of_core_classification.py

cba5c9f

Merge branch 'master' into patch-1

60c779c

jnothman reviewed Jan 8, 2019

View reviewed changes

jnothman added the help wanted label Jan 17, 2019

paragmoteria changed the title ~~Revise plot_plot_out_of_core_classification.py~~ Revise plot_out_of_core_classification.py Jan 19, 2019

amueller added the Needs work label Aug 6, 2019

Base automatically changed from master to main January 22, 2021 10:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revise plot_out_of_core_classification.py #12694

Revise plot_out_of_core_classification.py #12694

paragmoteria commented Nov 28, 2018

paragmoteria commented Nov 28, 2018

jnothman left a comment

paragmoteria commented Nov 29, 2018 via email

eamanu left a comment

eamanu Nov 29, 2018

paragmoteria Nov 29, 2018

jnothman left a comment

jnothman Dec 2, 2018

paragmoteria Dec 3, 2018

jnothman Dec 3, 2018

paragmoteria Dec 3, 2018

jnothman Dec 5, 2018

jnothman Jan 8, 2019

jnothman Dec 5, 2018

paragmoteria Dec 6, 2018

jnothman Dec 5, 2018

paragmoteria Dec 6, 2018

jnothman Dec 5, 2018

jnothman Dec 17, 2018

jnothman Dec 17, 2018

paragmoteria Dec 27, 2018

jnothman Dec 30, 2018

jnothman left a comment

jnothman Jan 8, 2019

jnothman Jan 8, 2019

jnothman Jan 8, 2019

jnothman Jan 8, 2019

paragmoteria commented Jan 8, 2019 via email

ogrisel commented Feb 25, 2021


		def get_minibatch(doc_iter_train, size, pos_class=positive_class):
		"""Extract a minibatch of examples, return a tuple X_text, y.

		from sklearn.datasets import get_data_home
		from sklearn.externals.six.moves import html_parser

Revise plot_out_of_core_classification.py #12694

Are you sure you want to change the base?

Revise plot_out_of_core_classification.py #12694

Conversation

paragmoteria commented Nov 28, 2018

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

paragmoteria commented Nov 28, 2018

jnothman left a comment

Choose a reason for hiding this comment

paragmoteria commented Nov 29, 2018 via email

eamanu left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jnothman left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jnothman left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

paragmoteria commented Jan 8, 2019 via email

ogrisel commented Feb 25, 2021