# Corrupted Zipfiles

Out of the 40 files we were trying to download, some needed to be fetched from the [Long-Term Archive](https://scihub.copernicus.eu/userguide/#LTA_Long_Term_Archive_Access).
After retrying the download several times, all files could be retrieved.
However, some of the downloaded zip files are suspiciously small:

In [15]:
! ls -rSsh input/tempelhofer_feld/*.zip

 25M input/tempelhofer_feld/S2A_MSIL2A_20190623T101031_N0212_R022_T33UUU_20190623T132509.zip
 29M input/tempelhofer_feld/S2B_MSIL2A_20190512T102029_N0212_R065_T33UUU_20190512T134103.zip
 29M input/tempelhofer_feld/S2A_MSIL2A_20190216T102111_N0211_R065_T33UUU_20190216T130428.zip
 30M input/tempelhofer_feld/S2A_MSIL2A_20190424T101031_N0211_R022_T32UQD_20190424T162325.zip
 30M input/tempelhofer_feld/S2A_MSIL2A_20190404T101031_N0211_R022_T32UQD_20190404T174806.zip
 31M input/tempelhofer_feld/S2B_MSIL2A_20190419T101029_N0211_R022_T33UUU_20190419T132322.zip
 35M input/tempelhofer_feld/S2A_MSIL2A_20190613T101031_N0212_R022_T33UUU_20190614T125329.zip
 38M input/tempelhofer_feld/S2A_MSIL2A_20190822T101031_N0213_R022_T32UQD_20190822T143621.zip
 42M input/tempelhofer_feld/S2A_MSIL2A_20190603T101031_N0212_R022_T33UUU_20190603T114652.zip
 43M input/tempelhofer_feld/S2A_MSIL2A_20190407T102021_N0211_R065_T33UUU_20190407T134109.zip
723M input/tempelhofer_feld/S2A_MSIL2A_20190114T101351_N0211_R022_T32U

Trying to extract them causes an error:

In [6]:
! ls -S input/tempelhofer_feld/*.zip | tail -n1 | xargs unzip

Archive:  input/tempelhofer_feld/S2A_MSIL2A_20190623T101031_N0212_R022_T33UUU_20190623T132509.zip
  End-of-central-directory signature not found.  Either this file is not
  a zipfile, or it constitutes one disk of a multi-part archive.  In the
  latter case the central directory and zipfile comment will be found on
  the last disk(s) of this archive.
unzip:  cannot find zipfile directory in one of input/tempelhofer_feld/S2A_MSIL2A_20190623T101031_N0212_R022_T33UUU_20190623T132509.zip or
        input/tempelhofer_feld/S2A_MSIL2A_20190623T101031_N0212_R022_T33UUU_20190623T132509.zip.zip, and cannot find input/tempelhofer_feld/S2A_MSIL2A_20190623T101031_N0212_R022_T33UUU_20190623T132509.zip.ZIP, period.


## What does the API say?

In [3]:
import os
import sentinelsat

In [4]:
api = sentinelsat.SentinelAPI(os.getenv('SCIHUB_USERNAME'), os.getenv('SCIHUB_PASSWORD'))

In [5]:
res = api.to_geodataframe(api.query(raw='S2A_MSIL2A_20190623T101031_N0212_R022_T33UUU_20190623T132509'))

  return _prepare_from_string(" ".join(pjargs))


In [6]:
res['size']

bedec483-5ee1-4264-8dfa-a3b53ce364f7    816.67 MB
Name: size, dtype: object

We can see that the size given by the scihub api is way larger.

## Do the downloads fail repeatedly?

All files have been downloaded again to another folder, `input/tempelhofer_feld_test`.

In [7]:
! find input/tempelhofer_feld -type f -size -500M  -name '*.zip' | xargs md5sum

9ca05754c4cc5ff9d2bddf99e2e9e753  input/tempelhofer_feld/S2A_MSIL2A_20190603T101031_N0212_R022_T33UUU_20190603T114652.zip
5424cf8c0dd4384382366b37af9ee995  input/tempelhofer_feld/S2A_MSIL2A_20190404T101031_N0211_R022_T32UQD_20190404T174806.zip
f2050867b04f8911dfcd1412846f5f0e  input/tempelhofer_feld/S2A_MSIL2A_20190216T102111_N0211_R065_T33UUU_20190216T130428.zip
5c41f18b6c9745df406dbca49c50b0c7  input/tempelhofer_feld/S2B_MSIL2A_20190419T101029_N0211_R022_T33UUU_20190419T132322.zip
8e9dc7b716056f702912d11197fab44c  input/tempelhofer_feld/S2A_MSIL2A_20190407T102021_N0211_R065_T33UUU_20190407T134109.zip
7241ca7fc6ccca5eb8935efe1b834697  input/tempelhofer_feld/S2B_MSIL2A_20190512T102029_N0212_R065_T33UUU_20190512T134103.zip
7d2b67dac6f36f1d8744ec2ef296445f  input/tempelhofer_feld/S2A_MSIL2A_20190613T101031_N0212_R022_T33UUU_20190614T125329.zip
b078b9d41e7be70a89961214d4adb72b  input/tempelhofer_feld/S2A_MSIL2A_20190424T101031_N0211_R022_T32UQD_20190424T162325.zip
f4a2910be181bd1c85fba14e

In [8]:
! find input/tempelhofer_feld_test -type f -size -500M  -name '*.zip' | xargs md5sum

9ca05754c4cc5ff9d2bddf99e2e9e753  input/tempelhofer_feld_test/S2A_MSIL2A_20190603T101031_N0212_R022_T33UUU_20190603T114652.zip
5424cf8c0dd4384382366b37af9ee995  input/tempelhofer_feld_test/S2A_MSIL2A_20190404T101031_N0211_R022_T32UQD_20190404T174806.zip
f2050867b04f8911dfcd1412846f5f0e  input/tempelhofer_feld_test/S2A_MSIL2A_20190216T102111_N0211_R065_T33UUU_20190216T130428.zip
5c41f18b6c9745df406dbca49c50b0c7  input/tempelhofer_feld_test/S2B_MSIL2A_20190419T101029_N0211_R022_T33UUU_20190419T132322.zip
8e9dc7b716056f702912d11197fab44c  input/tempelhofer_feld_test/S2A_MSIL2A_20190407T102021_N0211_R065_T33UUU_20190407T134109.zip
7241ca7fc6ccca5eb8935efe1b834697  input/tempelhofer_feld_test/S2B_MSIL2A_20190512T102029_N0212_R065_T33UUU_20190512T134103.zip
7d2b67dac6f36f1d8744ec2ef296445f  input/tempelhofer_feld_test/S2A_MSIL2A_20190613T101031_N0212_R022_T33UUU_20190614T125329.zip
b078b9d41e7be70a89961214d4adb72b  input/tempelhofer_feld_test/S2A_MSIL2A_20190424T101031_N0211_R022_T32UQD_2019

The downloads are failing in exactly the same way when trying the downloads repeatedly.

## Manual Download

In [12]:
res['link'].iloc[0]

"https://scihub.copernicus.eu/apihub/odata/v1/Products('bedec483-5ee1-4264-8dfa-a3b53ce364f7')/$value"

When following the link above, the target file is 25MB.
This points towards an error on the side of scihub.