Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: batch_path_differ test #5639

Merged
merged 1 commit into from
Oct 5, 2022
Merged

fix: batch_path_differ test #5639

merged 1 commit into from
Oct 5, 2022

Conversation

Laerte
Copy link
Member

@Laerte Laerte commented Sep 26, 2022

Workaround for #5633

While debugging in Windows to identify the root cause i noticed that the test don't fail. So it seems that in Windows the test run too fast and this result in a invalid output, so in order to fix the test i add a delay only for Windows and this make the test suceed, since now the output is correct.

Check #5639 (comment) for details.

@codecov
Copy link

codecov bot commented Sep 26, 2022

Codecov Report

Merging #5639 (3ca7877) into master (79a4bc3) will decrease coverage by 0.01%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #5639      +/-   ##
==========================================
- Coverage   88.85%   88.83%   -0.02%     
==========================================
  Files         162      162              
  Lines       10964    10964              
  Branches     1894     1894              
==========================================
- Hits         9742     9740       -2     
- Misses        942      943       +1     
- Partials      280      281       +1     
Impacted Files Coverage Δ
scrapy/core/downloader/__init__.py 90.97% <0.00%> (-1.51%) ⬇️

@gliptak
Copy link
Contributor

gliptak commented Sep 27, 2022

I also noticed this error being intermittent

@Gallaecio
Copy link
Member

I am OK with merging this for now as a workaround, but I would not close #5633 with it. I think we still need to find out why this is happening, where the extra item comes from without the delay.

@Gallaecio Gallaecio mentioned this pull request Sep 27, 2022
@Laerte
Copy link
Member Author

Laerte commented Sep 27, 2022

@Gallaecio Is not a extra item that is yielded on Windows.

Here is the output of data['json'] on Linux:

[b'[\n{"foo": "bar1", "egg": "spam1"}\n]', b'[\n{"foo": "bar2", "egg": "spam2", "baz": "quux2"}\n]', b'[\n{"foo": "bar3", "baz": "quux3"}\n]', b'']

4 items, 1 is empty that the reason that test have the:

self.assertEqual(len(items) + 1, len(data['json']))

Since we only have 3 items declared:
items = [
self.MyItem({'foo': 'bar1', 'egg': 'spam1'}),
self.MyItem({'foo': 'bar2', 'egg': 'spam2', 'baz': 'quux2'}),
self.MyItem({'foo': 'bar3', 'baz': 'quux3'}),
]

On Windows this is the output we have for data['json'] without the delay (That leads the test to fail):

[b'[\n{"foo": "bar1", "egg": "spam1"}\n]', b'[\n{"foo": "bar2", "egg": "spam2", "baz": "quux2"}\n]', b'[\n{"foo": "bar3", "baz": "quux3"}\n]']

As you can see we don't have the empty item in list on Windows.

@Laerte
Copy link
Member Author

Laerte commented Sep 27, 2022

@Gallaecio I refactored the test, now should work cross all environments.

@Laerte Laerte changed the title fix: batch_path_differ test in Windows fix: batch_path_differ test Sep 27, 2022
@kmike
Copy link
Member

kmike commented Sep 27, 2022

Do you know why are we getting an empty item on Linux? It's mystery to me :)
If there is a valid reason for that, probably it'd be better to ignore the empty result, instead of allowing both 3 and 4.
If there is no valid reason, we should fix it (and probably have an xfail test on Linux in the meantime).

@Laerte
Copy link
Member Author

Laerte commented Sep 27, 2022

@kmike The test generate 4 files, one of them is empty, this result in an empty item in list, since we parse all the files even the empty ones, we could add a validation in the run_and_export function to skip empty file, but this will result in others tests to fail, e.g: test_export_no_items_not_store_empty

run_and_export function:

for path, feed in FEEDS.items():
dir_name = os.path.dirname(path)
for file in sorted(os.listdir(dir_name)):
with open(os.path.join(dir_name, file), 'rb') as f:
data = f.read()
content[feed['format']].append(data)

I pushed other change doing the second thing that you suggested.

@Gallaecio
Copy link
Member

To me the question is: why is the empty file not generated in Windows? Not generating an empty file when using batch exports would be a nice-to-have feature, but I was under the impression that we did not have that feature yet implemented, and the sleep workaround seems to suggest we don´t. So, is the problem that we are not waiting for the (empty) file to be created, and on Windows that may not be as quick as on Linux? Then the fix is to make sure we wait for the file to be created. The test should actually test that the file is created, so the new implementation I think is the wrong approach. I would rather use the sleep workaround until we find how to make the code wait until the file is created.

@Laerte
Copy link
Member Author

Laerte commented Sep 28, 2022

@kmike @Gallaecio I add sleep again because even we filtering out the empty item we still have diff in output on windows 3 != 2, because on Windows instead of creating 3 files it created 2 and output the last 2 items in the last file with an invalid syntax:

image

@kmike
Copy link
Member

kmike commented Sep 29, 2022

Nice find @Laerte! I think it could make sense to make this test XFAIL on WIndows, and open a new issue for the problem you found.

@Laerte
Copy link
Member Author

Laerte commented Sep 29, 2022

@kmike I used skipif because for some reason with xfail it was still failing, i also left the loop to only grab valid items, i think we can use the same issue that @Gallaecio opened and add more context there, what you think?

@kmike
Copy link
Member

kmike commented Sep 29, 2022

Congrats for making the test suite green @Laerte! :)

What was the issue with xfail, how were you using it?
I think it can be good to add a link to GH issue to the reason.

@Laerte
Copy link
Member Author

Laerte commented Sep 29, 2022

0E867197-B01E-42FB-BDBB-37E8A3254B85

This is how i tried:
23ba1ca

@kmike
Copy link
Member

kmike commented Sep 29, 2022

Aha, so the test is not always failing, and so when strict is used, you get an XPASS error. Would it make sense to remove strict=true?

@Laerte
Copy link
Member Author

Laerte commented Sep 29, 2022

Indeed i tried without strict=True first, odd is that in my branch that only run this test it suceed: Laerte@5405ce5

@gliptak
Copy link
Contributor

gliptak commented Oct 1, 2022

any additional updates needed before this PR ready to be merged?

Copy link
Member

@kmike kmike left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks @Laerte!

@kmike
Copy link
Member

kmike commented Oct 2, 2022

@gliptak the PR looks good to me. Next step is to get a second approval from one of the Scrapy maintainers.

@kmike kmike requested a review from wRAR October 4, 2022 08:37
@wRAR wRAR merged commit d96c465 into scrapy:master Oct 5, 2022
@Laerte Laerte deleted the fix/test_batch_path_differ branch October 5, 2022 10:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants