Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CurlAccess not creating temp files correctly on Windows #11

Closed
havok2063 opened this issue Nov 17, 2019 · 17 comments
Closed

CurlAccess not creating temp files correctly on Windows #11

havok2063 opened this issue Nov 17, 2019 · 17 comments
Assignees
Labels

Comments

@havok2063
Copy link
Collaborator

I'm trying to test that this works so I can integrate the PR into Marvin. It currently fails for me testing on my Windows 7 machine. I think the issue here is that the temporary files aren't being created correctly.

To reproduce

from sdss_access.sync import Access, CurlAccess
curl=CurlAccess(public=True,release='DR15',verbose=True)
curl.remote()
curl.add('mangacube', drpver='v2_4_3', plate=8485, ifu=1901)
curl.set_stream()
SDSS_ACCESS> Expanding wildcards 'https://data.sdss.org/sas/dr15/manga/spectro/r
edux/v2_4_3/8485/stack/manga-8485-1901-LOGCUBE.fits.gz'
SDSS_ACCESS> Reducing the number of streams from 5 to 1, the number of download
tasks.

curl.commit()
SDSS_ACCESS> CREATE \Users\Brian\AppData\Local\Temp\sdss_access\20191117_002
SDSS_ACCESS> streamlets added to \Users\Brian\AppData\Local\Temp\sdss_access\201
91117_002
SDSS_ACCESS> [background]$ 'curl  --create-dirs --fail -sSRLK \\Users\\Brian\\Ap
pData\\Local\\Temp\\sdss_access\\20191117_002\\sdss_access_00.txt'

The following error occurs. The same error occurs when using Access instead of CurlAccess.

---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
<ipython-input-12-6f9c2a224368> in <module>
----> 1 curl.commit()

e:\work\git\sdss\marvin\python\marvin\extern\sdss_access\python\sdss_access\sync
\baseaccess.py in commit(self, offset, limit)
    187         self.stream.append_tasks_to_streamlets(offset=offset, limit=limi
t)
    188         self.stream.commit_streamlets()
--> 189         self.stream.run_streamlets()
    190         self.stream.reset_streamlet()

e:\work\git\sdss\marvin\python\marvin\extern\sdss_access\python\sdss_access\sync
\stream.py in run_streamlets(self)
    136             streamlet['logfile'] = open("{0}.log".format(streamlet['path
']), "w")
    137             streamlet['errfile'] = open("{0}.err".format(streamlet['path
']), "w")
--> 138             streamlet['process'] = self.cli.get_background_process(strea
mlet['command'], logfile=streamlet['logfile'], errfile=streamlet['errfile'])
    139             if self.verbose:
    140                 print("SDSS_ACCESS> rsync stream %s logging to %s" % (st
reamlet['index'],streamlet['logfile'].name))

e:\work\git\sdss\marvin\python\marvin\extern\sdss_access\python\sdss_access\sync
\cli.py in get_background_process(self, command, logfile, errfile, pause)
     73             stdout = logfile if logfile else STDOUT
     74             stderr = errfile if errfile else STDOUT
---> 75             background_process = Popen(split(str(command), posix=is_posi
x), env=self.env if 'rsync -' in command else None, stdout=stdout, stderr=stderr
)
     76             if pause:
     77                 sleep(pause)

C:\Users\Brian\Anaconda3\envs\marvin\lib\subprocess.py in __init__(self, args, b
ufsize, executable, stdin, stdout, stderr, preexec_fn, close_fds, shell, cwd, en
v, universal_newlines, startupinfo, creationflags, restore_signals, start_new_se
ssion, pass_fds, encoding, errors, text)
    798                                 c2pread, c2pwrite,
    799                                 errread, errwrite,
--> 800                                 restore_signals, start_new_session)
    801         except:
    802             # Cleanup if the child failed starting.

C:\Users\Brian\Anaconda3\envs\marvin\lib\subprocess.py in _execute_child(self, a
rgs, executable, preexec_fn, close_fds, pass_fds, cwd, env, startupinfo, creatio
nflags, shell, p2cread, p2cwrite, c2pread, c2pwrite, errread, errwrite, unused_r
estore_signals, unused_start_new_session)
   1205                                          env,
   1206                                          os.fspath(cwd) if cwd is not No
ne else None,
-> 1207                                          startupinfo)
   1208             finally:
   1209                 # Child is launched. Close the parent's copy of those pi
pe

FileNotFoundError: [WinError 2] The system cannot find the file specified

Here is my curl.stream.task.

[{'location': 'dr15\\manga\\spectro\\redux\\v2_4_3\\8485\\stack\\manga-8485-1901
-LOGCUBE.fits.gz',
  'source': 'https://data.sdss.org/sas/dr15/manga/spectro/redux/v2_4_3/8485/stac
k/manga-8485-1901-LOGCUBE.fits.gz',
  'destination': 'C:\\Work\\sdss\\sas\\dr15\\manga\\spectro\\redux\\v2_4_3\\8485
\\stack\\manga-8485-1901-LOGCUBE.fits.gz',
  'exists': None}]

and my curl.stream.streamlet

[{'index': 0,
  'location': ['dr15\\manga\\spectro\\redux\\v2_4_3\\8485\\stack\\manga-8485-190
1-LOGCUBE.fits.gz'],
  'source': ['https://data.sdss.org/sas/dr15/manga/spectro/redux/v2_4_3/8485/sta
ck/manga-8485-1901-LOGCUBE.fits.gz'],
  'destination': ['C:\\Work\\sdss\\sas\\dr15\\manga\\spectro\\redux\\v2_4_3\\848
5\\stack\\manga-8485-1901-LOGCUBE.fits.gz'],
  'path': '\\Users\\Brian\\AppData\\Local\\Temp\\sdss_access\\20191117_002\\sdss
_access_00',
  'command': 'curl  --create-dirs --fail -sSRLK \\Users\\Brian\\AppData\\Local\\
Temp\\sdss_access\\20191117_002\\sdss_access_00.txt',
  'logfile': <_io.TextIOWrapper name='\\Users\\Brian\\AppData\\Local\\Temp\\sdss
_access\\20191117_002\\sdss_access_00.log' mode='w' encoding='cp1252'>,
  'errfile': <_io.TextIOWrapper name='\\Users\\Brian\\AppData\\Local\\Temp\\sdss
_access\\20191117_002\\sdss_access_00.err' mode='w' encoding='cp1252'>}]

My cli.data_dir and cli.tmp_dir are set correctly.

curl.stream.cli.data_dir
'\\Users\\Brian\\AppData\\Local\\Temp'

curl.stream.cli.tmp_dir
'\\Users\\Brian\\AppData\\Local\\Temp'

If I look at the streamlet path location, C:\\Users\\Brian\\AppData\\Local\\Temp\\sdss_access\\20191117_002\\sdss _access_00', I see the following files

sdss_access_00.err
sdss_access_00
sdss_access_00

which I think are supposed to be the .txt and .log files, but they don't look like they are creating correctly.

@joelbrownstein
Copy link
Contributor

@mstalbot can you please look at this issue?

@havok2063
Copy link
Collaborator Author

I think it's an issue with Windows truncating the file extensions when writing certain file types, which then messes up the path definitions/locations for the code to recognize. When I try

with open('E:\\Work\\sdss\\test.txt', 'w') as file:
    file.write('stuff here')

I get a text file called test, without the extension, with the content written into it. When I write to test.err it includes the .err extension and Windows "recognizes" it as an "ERR" file. I can do the same with any other extension, test.other, etc. It seems the .txt and .log extensions are treated in Windows differently, recognized as text files, and have their extensions removed. Maybe there's a way to force it to include the extension when it creates the file?

How did this work for you, @mstalbot? I'm running Windows 7. Is there a setting I have turned off or on that's causing this?

@joelbrownstein
Copy link
Contributor

it is likely that the .txt and .err are simply hidden, not removed. googled this:
https://www.howtohaven.com/system/show-file-extensions-in-windows-explorer.shtml

@mstalbot
Copy link
Contributor

I used Windows 10 (64 bit) to run the code for the test.txt example above and can see the .txt extension present in the output file in both 'File Explorer' (i.e. like Finder) and in python. Python only opens the test file if the .txt extension is included in the load. Can you successfully load the test file without stating the extension (hidden or not) into python?

This bug is interesting since the curl example above worked for me once #12 was fixed (thus I assumed #11 was fixed as well...sorry about that). I'm looking more into the issue and may be able to test this on a Windows 7 machine. I'll keep you updated.

@mstalbot
Copy link
Contributor

I suspect I may have found the problem. The error you encountered can be reproduced on Windows 10 by intentionally miss-typing the 'curl' in the command string in curl.stream.streamlet and running the Popen command. Preserving 'curl ' in the command and intentionally messing up another part of the command string does not return the error you encountered. Windows 7 is 11+ years old, and does not have Curl pre-installed (in fact, Microsoft is dis-continuing updates for Windows 7 this January). A bit of googling revealed that most or all Windows before Windows 10 do not have Curl pre-installed. Brian: Can you type 'curl --help' in the command prompt and verify if Curl is installed?

@mstalbot
Copy link
Contributor

If cURL is not installed on your system, no worries since cURL should be free to download. This site ('https://develop.zendesk.com/hc/en-us/articles/360001068567-Installing-and-using-cURL#install') gives detailed instructions on how to install cURL onto windows 7 and points you to the necessary download+certificate links from the cURL website. The site also has better info on when cURL was installed by default onto Windows 10 as per version 1803 or later.

If your system is 64 bit, you have the option to update to Windows 10 for free. Here is a video tutorial that instruct how to do this from the Microsoft web site: 'https://www.youtube.com/watch?v=9iUX7sQeOkg'.

@havok2063
Copy link
Collaborator Author

Thanks @joelbrownstein, @mstalbot. Very interesting stuff. It's quite possible I don't have curl on my machine. My desktop is quite old at this point. I still might have issues with the hidden file extensions, since I can recreate my problem simply with writing a test file. I'll be able to test all of your suggestions tonight and play around with it.

If it turns out to be particulars of my machine that I can overcome, then I think we should add these points into the documentation for using sdss_access on Windows.

@mstalbot
Copy link
Contributor

Your welcome. I tested the idea Joel suggested and noticed that selecting the ''Hide extensions for known file types' will hide all but the .err extensions in my 'File Explorer', while all extensions are still visible in the command prompt.

@joelbrownstein
Copy link
Contributor

Looks like we're converging on the bugfix by checking that cURL is installed. In addition to documenting this as Brian suggests, I think we should add the equivalent of which curl in curl access and exit verbosely if curl is not installed.

@mstalbot
Copy link
Contributor

Agreed.

@mstalbot
Copy link
Contributor

sdss_access/master now has a curl check and abort if not installed.

@havok2063
Copy link
Collaborator Author

Well it was a curl issue. I didn't have it installed. You can also install it if you Anaconda with conda install -c anaconda curl. After installing it, I get a new error. CurlAccess runs fine now but the curl fails due to lack of permission to create directories on my E drive. As a test, I manually ran the curl command but switching the output directory to my C drive and it worked. It created all the necessary directories and downloaded the file. So in principle things work. I now just need to figure out the permissions issues with curl and my E drive. I set full write permissions on E:\\Work and E:\\Work\\sdss which is where I want the stuff to download but that didn't work.

@havok2063
Copy link
Collaborator Author

Also, this is verbosity that I get when I run CurlAccess.

curl.commit()
SDSS_ACCESS> CREATE C:\Users\Brian\AppData\Local\Temp\sdss_access\20191119_004
SDSS_ACCESS> streamlets added to C:\Users\Brian\AppData\Local\Temp\sdss_access\2
0191119_004
SDSS_ACCESS> [background]$ 'curl  --create-dirs --fail -sSRLK C:\\Users\\Brian\\
AppData\\Local\\Temp\\sdss_access\\20191119_004\\sdss_access_00.txt'
SDSS_ACCESS> rsync stream 0 logging to C:\Users\Brian\AppData\Local\Temp\sdss_ac
cess\20191119_004\sdss_access_00.log
SDSS_ACCESS> Done!
SDSS_ACCESS> return code (23,)

At first glance, it looks like it succeeds. I have to go into the sdss_access_00.err file before I discover that there's any issues, where I find You don't have permission to create E:. The return code tips me off but only because I recognize it's suppose to be 0. At some point it might be good idea to have some additional checks after sdss_access completes based on the error code returned. A message saying the download failed for some reason and to check the error log would probably be sufficient.

@mstalbot
Copy link
Contributor

mstalbot commented Nov 20, 2019

Interesting since your error indicates that curl wants to create the E folder. I did not have that problem when I tested curl when setting home on the D drive vs C. I'll think on it.

Also, here is a potential edit for the 'return code' printout in stream.py (in run_streamlets):
replace

if self.cli.returncode and self.verbose:
    print("SDSS_ACCESS> return code {returncode}".format(returncode=self.cli.returncode))

with

 if any(returncode for returncode in self.cli.returncode) and self.verbose:
    print("SDSS_ACCESS> FAILED. See %s." %([streamlet_temp['errfile'].name for streamlet_temp in self.streamlet]))
 elif self.verbose:
    print("SDSS_ACCESS> Done!")'

and remove the "SDSS_ACCESS> Done!" printout command in cly.py.

@havok2063
Copy link
Collaborator Author

I like your suggestion. I've incorporated a version of it. Since self.cli.returncode is already a tuple, you don't need to loop over it. You can just call any(self.cli.returncode). I also tweaked the message to just point to the directory rather than printing all the names of the error logs. And since I think we want a success or fail message to display all the time, I removed the verbose condition as well.

@mstalbot
Copy link
Contributor

Sounds great! Thanks!

@havok2063
Copy link
Collaborator Author

This can now be closed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants