Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[It took too much time the second time] entry getData #469

Closed
phuong5 opened this issue Dec 14, 2023 · 10 comments
Closed

[It took too much time the second time] entry getData #469

phuong5 opened this issue Dec 14, 2023 · 10 comments

Comments

@phuong5
Copy link

phuong5 commented Dec 14, 2023

I am encountering an issue when using the @zip.js/zip.js library, version ^2.7.32. I have checked and identified an error in the getData code:

blobWriter = new BlobWriter(getMimeType(entry.filename));
await entry.getData(blobWriter); // => issue at here

The first time I select a zip file (containing other files inside), getData takes about 44 seconds. However, from the second time onwards, the process takes over 10 minutes. I have rerun it multiple times, and the result is consistently the same.
The environments used are:

Edge Version 120.0.2210.61 (Official build) (64-bit)
Chrome Version 119.0.6045.159 (Official Build) (64-bit)
Can anyone help me? Thanks!

@gildas-lormeau
Copy link
Owner

gildas-lormeau commented Dec 14, 2023

I guess this might be related to your filesystem perfomance. On my end, I cannot reproduce the issue. See the test here https://jsfiddle.net/uLwjpons/, and below.

<!doctype html>

<html>

<head>
  <title>Test getData in zip.js</title>
  <style>
    body {
      font-family: monospace
    }
  </style>
</head>

<body>
  <script type="module">

    import {
      BlobReader,
      BlobWriter,
      ZipReader,
      ZipWriter,
    } from "https://deno.land/x/zipjs/index.js";

    main().catch(console.error);

    async function main() {
      await log("INIT");
      const zipData = await createFile();
      await log("RUN");
      await runTest(zipData);
      await log("END");
    }

    async function createFile() {
      await log("STEP 1/2 (creating data)");
      const DATA_64_MB = new Array(64 * 1024 * 1024).fill(Math.floor(Math.random() * 128) * 2);
      const ENTRY_DATA = new Blob(new Array(8).fill(DATA_64_MB));
      await log("STEP 2/2 (zipping data)");
      const zipFileWriter = new BlobWriter();
      const entryDataReader = new BlobReader(ENTRY_DATA);
      const zipWriter = new ZipWriter(zipFileWriter);
      await zipWriter.add("test.bin", entryDataReader);
      return zipWriter.close();
    }

    async function runTest(zipData) {
      const zipFileReader = new BlobReader(zipData);
      const zipReader = new ZipReader(zipFileReader);
      const firstEntry = (await zipReader.getEntries()).shift();
      const iterations = new Array(9).fill().map((_, index) => index + 1);
      for (const iteration of iterations) {
        const startTime = performance.now();
        await firstEntry.getData(new BlobWriter());
        await log(`TEST ${iteration}/9 => ${performance.now() - startTime} ms`);
      }
    }

    async function log(value) {
      document.body.innerHTML += `${value}<br>`;
      await pause();
    }

    function pause() {
      return new Promise(resolve => setTimeout(resolve, 500));
    }

  </script>
</body>

</html>

Here are the logs when I run this test in Chrome. Performance is constant.

INIT
STEP 1/2 (creating data)
STEP 2/2 (zipping data)
RUN
TEST 1/9 => 3353 ms
TEST 2/9 => 3126.2000000476837 ms
TEST 3/9 => 3136.100000023842 ms
TEST 4/9 => 3173.699999988079 ms
TEST 5/9 => 3133.800000011921 ms
TEST 6/9 => 3098.800000011921 ms
TEST 7/9 => 3147.600000023842 ms
TEST 8/9 => 3120.699999988079 ms
TEST 9/9 => 3082.199999988079 ms
END

@phuong5
Copy link
Author

phuong5 commented Dec 15, 2023

@gildas-lormeau
I uploaded a 2GB file, the first time entry.getData ran it took 44 seconds, but from the second time onwards, it consistently took about 10 minutes. This behavior persisted across 10 attempts. I noticed that the onprogress method was running slowly from the second time. My code snippet is as follows, please check and help me.

export const isUsingPasswordOrInvalidFileZip = async (
    file: File,
    password?: string
): Promise<...> => {
    let reader: undefined | ZipReader<Blob>;
    try {
        reader = new ZipReader(new BlobReader(file), { password });
        const entries = await reader.getEntries();
        const pathAndFiles = new Map();

        for (const entry of entries) {
            if (!entry.directory) {
                const encoding = detect(entry.rawFilename);
                const textDecoder = new TextDecoder(encoding as string);
                const utf8Path = textDecoder.decode(entry.rawFilename);
                pathAndFiles.set(utf8Path, entry);
            }

            const startTime = performance.now();
            if (entry.getData) {
                await entry.getData(new BlobWriter(), {
                    // for debug
                    onprogress: async (progress, total) => {
                        console.log(progress);
                        console.log(total);
                    },
                });
            }
            console.log(`get data: => ${performance.now() - startTime} ms`);
        }

        if (pathAndFiles.size === 0) {
            return {...}
        }
            return {...}
    } catch (err: any) {
        if (err.message === ERR_ENCRYPTED || err.message === ERR_INVALID_PASSWORD) {
            console.log('password error!');
            return {...}
        }

        if (err.message === ERR_EOCDR_NOT_FOUND) {
            console.log('mime type error!');
            return {...}
        }

        console.log('error when reader:', err.message);
        return ...
    } finally {
        if (reader) {
            await reader.close();
        }
    }
};


export const isUsingPasswordOrInvalidFileZip$ = (
    file: File,
    password?: string
): Observable<{ ... }> => {
    return defer(() => isUsingPasswordOrInvalidFileZip(file, password));
};

@gildas-lormeau
Copy link
Owner

gildas-lormeau commented Dec 15, 2023

If you pass a Blob instead of a File as parameter to the two exported functions, do you see the same issue? Did you try to run your code on multiple machines?

@phuong5
Copy link
Author

phuong5 commented Dec 15, 2023

@gildas-lormeau
The file itself is a blob; I don't think there's a need to convert it. Furthermore, I also need to retrieve the entry and save it back, so I believe there's no need to perform any conversion.

@phuong5
Copy link
Author

phuong5 commented Dec 15, 2023

image

@gildas-lormeau
Copy link
Owner

gildas-lormeau commented Dec 15, 2023

I know that. I suspect the problem is coming from your filesystem, when reading the compressed data in the ZIP file. That's why I asked you to do a test and pass a Blob. That's also why I asked you if you have tested your code on multiple machines.

@phuong5
Copy link
Author

phuong5 commented Dec 15, 2023

@gildas-lormeau
I have modified the code as follows:

const fileBlob = new Blob([file], { type: file.type });
reader = new ZipReader(new BlobReader(fileBlob));

I just tried again and noticed that the time has reduced a bit, but there is still an issue because the second time takes longer, and I don't know the reason:

First attempt: 35s
Second attempt: 2m 03s
Third attempt: 5m 24s
Four attempt: 10m 47s

Additionally, when running, occasionally the following error appears:

image

@gildas-lormeau
Copy link
Owner

gildas-lormeau commented Dec 15, 2023

Thank you, maybe you are leaking memory and using the swap too much? Have you looked at what's happening at this level?

@gildas-lormeau
Copy link
Owner

gildas-lormeau commented Dec 15, 2023

Can you reproduce the issue with this test https://run.plnkr.co/preview/clq8gh5r400033b6ti2ixuutk/ in Chrome? If the link is broken, go to https://plnkr.co/edit/C8QoHl0kBD3dQMxV?preview and open the test page in a new tab by clicking the corresponding button in the upper right of the preview page. It's using the filesystem API in order to create the ZIP file on the disk. On my end, I'm still getting constant results.

INIT
STEP 1/1 (creating and zipping data)
RUN
TEST 1/9 => 3232.699999988079 ms
TEST 2/9 => 3199.099999964237 ms
TEST 3/9 => 3300 ms
TEST 4/9 => 3421.100000023842 ms
TEST 5/9 => 3297.199999988079 ms
TEST 6/9 => 3262.2999999523163 ms
TEST 7/9 => 3423.7999999523163 ms
TEST 8/9 => 3212 ms
TEST 9/9 => 3200.199999988079 ms
END
<!doctype html>
<html>

<head>
  <title>Perf test of Entry#getData in zip.js</title>
  <style>
    body {
      font-family: monospace;
    }
  </style>
</head>

<body>
  <button id=runTestButton>Run</button>
  <script type=module>

import {
  BlobReader,
  BlobWriter,
  ZipReader,
  ZipWriter,
} from "https://deno.land/x/zipjs/index.js";

const ZIP_EXTENSIONS_ACCEPT = {
  "application/zip": [".zip"],
};

const ONE_MB = 1024 * 1024;

runTestButton.addEventListener("click", async () => {
  let fileHandle;
  try {
    const suggestedName = [...new Array(16)].map(() => Math.floor(Math.random() * 16).toString(16)).join("") + ".zip";
    fileHandle = await showSaveFilePicker({
      suggestedName,
      mode: "readwrite",
      startIn: "downloads"
    });
    createZIPButton.remove();
    await log("INIT");
    const writable = await fileHandle.createWritable();
    const zipWriter = new ZipWriter(writable);
    const addFilePromises = [];
    const writers = [];
    for (let i = 0; i < 4; i++) {
      const transformStream = new TransformStream();
      addFilePromises.push(zipWriter.add(`test${i}.bin`, transformStream.readable));
      writers.push(transformStream.writable.getWriter());
    }
    await log("STEP 1/1 (creating and zipping data)");
    await Promise.all([
      ...writers.map(writer => fillData(writer)),
      ...addFilePromises
    ]);
    await zipWriter.close();
    const file = await fileHandle.getFile();
    const zipReader = new ZipReader(new BlobReader(file));
    const firstEntry = (await zipReader.getEntries()).shift();
    const iterations = new Array(9).fill().map((_, index) => index + 1);
    await log("RUN");
    for (const iteration of iterations) {
      const startTime = performance.now();
      await firstEntry.getData(new BlobWriter());
      await log(`TEST ${iteration}/9 => ${performance.now() - startTime} ms`);
    }
    await log("END");
  } finally {
    if (fileHandle) {
      await fileHandle.remove();
    }
  }
});

async function fillData(writer, currentSize = 0, maxSize = Math.floor((Math.random() * 256) + 512) * ONE_MB) {
  const chunkSize = ONE_MB;
  const chunk = new Uint8Array(chunkSize);
  for (let i = 0; i < chunkSize; i++) {
    chunk[i] = Math.floor(Math.random() * 128) * 2;
  }
  await writer.write(chunk);
  currentSize += chunkSize;
  if (currentSize < maxSize) {
    await fillData(writer, currentSize, maxSize);
  } else {
    await writer.close();
  }
}

async function log(value) {
  document.body.innerHTML += `${value}<br>`;
  await pause();
}

function pause() {
  return new Promise(resolve => setTimeout(resolve, 500));
}

  </script>
</body>

</html>

@phuong5
Copy link
Author

phuong5 commented Dec 18, 2023

Thank you, Gildas Lormeau. I used a different approach by using the 'encrypt' variable, and it resolved the issue.

Repository owner locked and limited conversation to collaborators Dec 18, 2023
@gildas-lormeau gildas-lormeau converted this issue into discussion #470 Dec 18, 2023

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants