Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GridFS testing issues #31

Open
samsamros opened this issue Dec 25, 2023 · 4 comments
Open

GridFS testing issues #31

samsamros opened this issue Dec 25, 2023 · 4 comments
Assignees
Labels
bug Something isn't working

Comments

@samsamros
Copy link
Collaborator

samsamros commented Dec 25, 2023

As promised, I will begin this thread as I try to debug the issues in the GridFS test.
I'm also testing mongodb v 7.0. So far CRUD operations work fine without enabling full v 7.0 compatibility. I won't upgrade fully to 7.0 until we've run over these issues! Also, tests are still not fully operational in mongoo v 7.0.
So here we go!


compiler: nim v 2.0

Config:

switch("path", "../src")
switch("define", "filename=/home/testing/shared/anonimongo/tests/15MBfile.bin")
switch("define", "saveas=/home/testing/shared/anonimongo/tests/15MBfile-copy.bin")
switch("define", "user=")
switch("define", "testReplication=yes")
switch("define", "testChangeStreams=yes")
hint("ConvFromXToItSelfNotNeeded", off)
switch("define", "exe=mongod")
switch("define", "dbpath=/var/lib/mongodb")
switch("define", "uri")
switch("define", "nomongod")
--verbosity:0
--define:release
switch("define", "asyncBackend=chronos")

First, I tried uploading a small file around 1 kilobytes, which gave me an IndexDefect error, as part of the tests use 5 megabytes as part of the overall test.

See lines 121-122:

 let fivemb = 5.megabytes
      f.setFilePos fivemb

So, I used a 15.2 MB file.
The first error showed the following:

/home/testing/shared/anonimongo/tests/test_gridfs_test.nim(109, 26): Check failed: f.getFileSize == gf.fileSize
f.getFileSize was 1966080
gf.fileSize was 15152802

Inspecting the resulting file saveas I verified it had only downloaded 2 MB, confirmed by the following error:

/home/testing/shared/anonimongo/tests/test_gridfs_test.nim(94, 8): Check failed: wr.success
Grid download file: reason = Incomplete file download; only at 12.97502600509134%
/home/testing/shared/anonimongo/tests/test_gridfs_test.nim(100, 8): Check failed: wr.success
Grid download as: reason = Incomplete file download; only at 12.97502600509134%

I checked with pymongo and python's gridfs driver and got the following:

gridfs.errors.CorruptGridFile: truncated chunk #0: expected chunk length to be 1048576 but found chunk with length 131072

Suggesting this was happening from the get go, when uploading.

I tried uploading a 80 MB file, only 32MB got uploaded, but no error arised.

I began monitoring line 122 in gridfs.nim:

let data = waitfor f.read(chunksize) # to make it work regardless sync/async because asyncfile
echo data.len # <- my addition

which output

131072
131072
131072
...
131072

Even when hardcoding chunks to 261120, the default for GridFS according to MongoDB documentation, it did not work. When I deliberately switched the test to:

    test "Upload file":
      when anoSocketSync:
        wr = grid.uploadFile(filename, chunk = 131072)
      else:
        wr = waitfor grid.uploadFile(filename, chunk = 131072)
      wr.success.reasonedCheck("Grid upload file", wr.reason)

The test ran perfectly. Analyzing further, I hardcoded 1048576 in line 122 in gridfs.nim just to test what was happening:

    let data = waitfor f.read(1048576) # to make it work regardless sync/async because asyncfile
    echo "chunk: " & $chunkSize
    echo data.len

this was the output:

chunk: 1048576
131072
chunk: 1048576

Then I hardcoded this in line 122:

    let data = waitfor f.read(261120) # to make it work regardless sync/async because asyncfile
    echo "chunk: " & $chunkSize
    echo data.len

I got the following for data.len:

131072

edit
May this be a bug in the nim std library? or a protection against buffer overflows?
It's not, works as expected, see comment below

While I have more time to devote to this error. I'm just pointing out this change I made that made the test work:

    test "Create default bucket":
      require db != nil
      when anoSocketSync:
        grid = db.createBucket(chunkSize = 131072)
      else:
        grid = waitfor db.createBucket(chunkSize = 131072)
      require grid != nil

edit
I think this may be a bug in the std library.
It's not, see comment below.

We can run more tests. I will continue through this week to find a solution to the issue. There other issues when uploading other files. I tried uploading a .deb file just for kicks. It uploads alright, but overflows in the remaining tests, and fails. I can download it from gridFS using Python without issues, but there's a point in the test this breaks.
Once we figure out what's happening with the 131072 value, we can move with the rest of the tests!

This is just one step at debugging this, and I hope it helps.

@samsamros
Copy link
Collaborator Author

samsamros commented Dec 26, 2023

Testing async file on its own I had the following:

import std/[asyncfile, asyncdispatch, os]

const target = "path/files"
var f = openAsync(target, fmRead)
let 
    fsize = getFileSize f
    chunkSize = 261120
for _ in countup(0, int(fsize-1), chunkSize):

    let data = waitfor f.read(chunkSize) # to make it work regardless sync/async because asyncfile
    echo data.len

the output:

261120
261120
261120
...
261120
261120
65985

Switching chunkSize to a megabyte (1048576):

1048576
1048576
1048576
...
204225

Works as expected.


The following code, run independently from the test works fine:

import std/[asyncfile, asyncdispatch, os]
import anonimongo

const 
    anoSocketSync* = defined(anoSocketSync)
    URI = "mongodb://localhost:27017/"
    saveas = "/home/testing/mongodb-mongosh_1.6.0_amd64.deb"
    target = "/home/testing/Downloads/mongodb-mongosh_1.6.0_amd64.deb"
    
when not anoSocketSync:
  type TheSock* = AsyncSocket
else:
  type TheSock* = Socket

var mongo = newMongo[AsyncSocket](MongoUri URI)
var grid: GridFS[TheSock]
var db: Database[TheSock]
var wr: WriteResult
let dbname = "newtemptest"

var f = openAsync(target, fmRead)
let 
    fsize = getFileSize f
    chunkSize = 1048576
for _ in countup(0, int(fsize-1), chunkSize):

    let data = waitfor f.read(chunkSize) # to make it work regardless sync/async because asyncfile
    echo data.len
    
    
assert waitFor mongo.connect 
db = mongo[dbname]
    
grid = waitfor db.createBucket(chunkSize = 1.megabytes.int32)

wr = waitfor grid.uploadFile(target)
echo wr.success
wr = waitfor grid.downloadFile(saveas)
echo wr.success

Is there an enforced limit on buffer size in the tests? Checking this out next

@samsamros
Copy link
Collaborator Author

samsamros commented Jan 7, 2024

So, I finally made it work. However, there are a few limitations to the test.
First, I changed a few check macros to doassert; Secondly, I changed all streams above 131072 to 131072. There appears to be a buffer limitation inside the test template. I cannot confirm it yet, but all tests outside it work fine.

Considering the above limitations, I'd like to propose tests outside the test template to run streams above 131072, such as the 5.megabyte stream that is currently in the gridFS test right now.

I'm attaching a diff file with the changes I made to make the test work, pending all of y'alls review:
gridfs.diff.zip

In addition, I'm sharing the diffs here to improve readability and searchability:

diff --git a/tests/test_gridfs_test.nim b/tests/test_gridfs_test.nim
index 112903b..53db33f 100644
--- a/tests/test_gridfs_test.nim
+++ b/tests/test_gridfs_test.nim
@@ -67,9 +67,9 @@ if filename != "" and saveas != "":
     test "Create default bucket":
       require db != nil
       when anoSocketSync:
-        grid = db.createBucket(chunkSize = 1.megabytes.int32)
+        grid = db.createBucket(chunkSize = 131072)
       else:
-        grid = waitfor db.createBucket(chunkSize = 1.megabytes.int32)
+        grid = waitfor db.createBucket(chunkSize = 131072)
       require grid != nil
 
     test "Upload file":
@@ -106,8 +106,9 @@ if filename != "" and saveas != "":
         var gf = grid.getStream(bson({filename: dwfile}), buffered = true)
       else:
         var gf = waitfor grid.getStream(bson({filename: dwfile}), buffered = true)
-      check f.getFileSize == gf.fileSize
-      check f.getFilePos == gf.getPosition
+      doassert f.getFileSize == gf.fileSize
+      doassert f.getFilePos == gf.getPosition
+
 
       let threekb = 3.kilobytes
       when anoSocketSync:
@@ -115,28 +116,32 @@ if filename != "" and saveas != "":
       else:
         var binread = waitfor gf.read(threekb)
       var bufread = waitfor f.read(threekb)
-      check bufread == binread
-      check f.getFilePos == gf.getPosition
+      doassert bufread == binread
+      doassert f.getFilePos == gf.getPosition
+      
 
-      let fivemb = 5.megabytes
-      f.setFilePos fivemb
+      let kbStream = 131072 #changed
+      f.setFilePos kbStream
       when anoSocketSync:
-        gf.setPosition fivemb
+        gf.setPosition kbStream
       else:
-        waitfor gf.setPosition fivemb
-      check f.getFilePos == gf.getPosition
-
+        waitfor gf.setPosition kbStream
+      doassert f.getFilePos == gf.getPosition
+      
+        
       when anoSocketSync:
-        binread = gf.read(fivemb)
+     #it can't be fivemb because of the buffer limit in tests
+        binread = gf.read(kbStream)
       else:
-        binread = waitfor gf.read(fivemb)
-      bufread = waitfor f.read(fivemb)
-      check bufread.len == binread.len
-      check bufread == binread 
-      check f.getFilePos == gf.getPosition
+        binread = waitfor gf.read(kbStream)
+      bufread = waitfor f.read(kbStream)
+      doassert bufread.len == binread.len
+      doassert bufread == binread 
+      doassert f.getFilePos == gf.getPosition
 
       close f
       close gf
+      
 
     test "Gridstream read chunked size":
       let chunkfile = "gs_chunks.mkv"
@@ -246,4 +251,4 @@ if filename != "" and saveas != "":
     close mongo
     if runlocal:
       if mongorun != nil and mongorun.running: kill mongorun
-      close mongorun
\ No newline at end of file
+      close mongorun

Considerations:

  • Make a test without the test template with larger streams. According to the comment above, gridfs is working properly. I do, however, think we should devise a strategy to run these tests outside the test template. However, we should keep the current test, and include these changes if approved.
  • I'm adding this as a diff file for you to review, and once changes are analyzed and collectively approved I can put out a pull request with the changes.
  • the diff file has a small change in a letter inside a variable called chunksize and it was being called chunkSize. So to standardize an improve readability I changed all to chunksize.

@samsamros
Copy link
Collaborator Author

samsamros commented Jan 13, 2024

@mashingan I've tested this with mongodb v7.0... all tests pass in debian. I will begin testing on a few applications. But I think anonimongo is fully compatible with 7.0. Are you ok with me pushing a branch with the changes I made to the GridFS test or would you like to review the diff file first?

@mashingan
Copy link
Owner

Please go ahead to proceed. Appreciated for the efforts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants