Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Blob reading APIs (arrayBuffer/text) #24287

Closed
jdm opened this issue Sep 25, 2019 · 38 comments
Closed

Implement Blob reading APIs (arrayBuffer/text) #24287

jdm opened this issue Sep 25, 2019 · 38 comments

Comments

@jdm
Copy link
Member

@jdm jdm commented Sep 25, 2019

Spec:

Code:

  • components/script/dom/webidls/Blob.webidl
  • compoentns/script/dom/blob.rs

Tests:

  • ./mach test-wpt tests/wpt/web-platform-tests/FileAPI/blob/Blob-array-buffer.any.js
  • ./mach test-wpt tests/wpt/web-platform-tests/FileAPI/blob/Blob-text.any.js
@jdm
Copy link
Member Author

@jdm jdm commented Sep 25, 2019

In-progress steps:

  • add the missing WebIDL methods for Blob.arrayBuffer and Blob.text to Blob.webidl
  • implement the missing Rust methods for the Blob type in blob.rs based on the specification for the methods
  • extract read_file into two separate functions that are called by read_file:
    • one that accepts &GlobalScope and Uuid arguments and returns an IpcReceiver<profile_traits::FileManagerResult<ReadFileProgress>>
    • one that accepts an IpcReceiver<profile_traits::ReadFileProgress> and returns Result<Vec<u8>, ()>
  • create a new read_file_async method that calls these helper methods, but calls the reader loop helper inside of a spawned thread
    • make read_file_async accept a Rc<Promise> argument and convert it to a TrustedPromise
    • use the return value of the helper method to
@mvashishth
Copy link

@mvashishth mvashishth commented Sep 29, 2019

I'd love to work on it. Can you assign it to me?

@CYBAI CYBAI added the C-assigned label Sep 29, 2019
@CYBAI
Copy link
Collaborator

@CYBAI CYBAI commented Sep 29, 2019

@mvashishth Go ahead :)

@mvashishth
Copy link

@mvashishth mvashishth commented Sep 29, 2019

Can I get more information about "get stream" algorithm? Can't find it anywhere.

@jdm
Copy link
Member Author

@jdm jdm commented Sep 30, 2019

@mvashishth Since Servo does not support streams yet, my incomplete steps from above were designed to implement a stream-like file reading operation.

@jdm
Copy link
Member Author

@jdm jdm commented Sep 30, 2019

The easiest first step is to implement the new methods synchronously and resolve the Promise value with the blob contents. Once that is done, it will be easier to focus on making the actual operation asynchronous.

@mvashishth
Copy link

@mvashishth mvashishth commented Sep 30, 2019

The documentation says Blob has an associated get stream algorithm. I didn't find it anywhere or am I missing something?

@jdm
Copy link
Member Author

@jdm jdm commented Sep 30, 2019

You are not missing anything. Servo doesn't support streams yet, so we don't implement that algorithm yet.

@mvashishth
Copy link

@mvashishth mvashishth commented Oct 1, 2019

Blob.text()
Blob.arrayBuffer()
If I am not wrong, these are the javascript equivalents for the methods I am working on right?
Also can you please give me a use case for both the methods?
Thanks

@jdm
Copy link
Member Author

@jdm jdm commented Oct 1, 2019

Yes, you are correct. These APIs are used to extract the contents of a Blob into a value that JS code that interact with. Websites could create a Blob and store it in some persistent storage (like IndexedDB), and then later get the blob out of the storage and use these APIs to read the blob's contents. You can learn more about Blobs at https://www.javascripture.com/Blob.

@mvashishth
Copy link

@mvashishth mvashishth commented Oct 1, 2019

The scope of this issue only covers arrays as blobs right?

@jdm
Copy link
Member Author

@jdm jdm commented Oct 1, 2019

As opposed to what? The scope of this issue is the unimplemented arrayBuffer and text methods from the Blob specification.

@mvashishth
Copy link

@mvashishth mvashishth commented Oct 3, 2019

I tried using the blob.text() implemented in Chromium. It returns a string made of all the elements of the blob array. Any reason why we need to convert it to byte sequence first using the "get stream" algorithm?

@jdm
Copy link
Member Author

@jdm jdm commented Oct 3, 2019

@mvashishth All blob contents are stored as a Vec<u8> internally. That means that our version of getting the stream obtains a copy of that vec and them converts that into a string or arraybuffer as appropriate. Does that answer your question?

@mvashishth
Copy link

@mvashishth mvashishth commented Oct 8, 2019

Servo takes an hour to build on my laptop. Is there a way I can only build specific components? Also, can I ask Rust and WebIDL syntax related questions here?

@jdm
Copy link
Member Author

@jdm jdm commented Oct 8, 2019

You can run ./mach build -d -p script to only build changes in code in components/script and anything it depends upon, but I doubt it would make much difference. ./mach check will typecheck your changes without building binaries and should be much quicker. And yes, you can ask questions about Rust and WebIDL here.

@mvashishth
Copy link

@mvashishth mvashishth commented Oct 24, 2019

Is there a simpler way of making the get_bytes function? At the minimum, it should be able to just read the contents of the blob object and run the get stream of algorithm right? Can I do a python implementation of the function first and get that reviewed? I am having a hard time understanding the syntax of Rust. It'd be much better if I can understand the program structure in a programming language I am familiar with.

@jdm
Copy link
Member Author

@jdm jdm commented Oct 24, 2019

If writing pseudocode in another language will help you understand what needs to be done in Rust, that's fine.

@jdm
Copy link
Member Author

@jdm jdm commented Oct 24, 2019

What I meant by #24287 (comment) is that for the first step, calling get_bytes() directly to retrieve the contents of the Blob will give us the bytes that we can use to create a blob or an arraybuffer, and we can then create a promise value and resolve the promise with the blob or arraybuffer. From there it will be easier to work on making it asynchronous instead of synchronous.

@mvashishth
Copy link

@mvashishth mvashishth commented Oct 24, 2019

Can you explain how exactly get_bytes() works?

@kunalmohan
Copy link
Collaborator

@kunalmohan kunalmohan commented Jan 7, 2020

Hey @mvashishth, are you still working on this issue? If not, I'd like to try it.

@jdm
Copy link
Member Author

@jdm jdm commented Jan 7, 2020

To answer the previous question (sorry for missing it), get_bytes has three different cases:

  • the blob is stored in memory, so we can clone the bytes directly
  • the blob represents a file, so we send a message to the blob manager thread asking for the bytes, and it opens the file and returns them to us
  • the blob is a slice of another blob, so we get the bytes of the original (whether it's file or memory-based) and return the corresponding slice of those bytes
@kunalmohan
Copy link
Collaborator

@kunalmohan kunalmohan commented Jan 8, 2020

I see two methods in promises to create a new promise- one creates a new compartment and the other does the work in the current compartment. I am having a bit of trouble in understanding compartments and when to use which method. I read a little about compartments in the josephine crate, which helped a bit, but the usage of both the methods is still unclear. Can you explain that?

@jdm
Copy link
Member Author

@jdm jdm commented Jan 8, 2020

Some of the compartment terminology is outdated, unfortunately, and we should be talking about entering realms instead. However, the short version is that promise objects have to be allocated in a particular realm. The API exists to ensure that the caller has either verified that we've already entered a realm (by passing the InCompartment argument) or asserts if there a realm has actually not been entered. In the short term you can use the API that creates a promise in the current compartment, and we can sort out whether that's appropriate during review.

@mvashishth
Copy link

@mvashishth mvashishth commented Jan 8, 2020

@jdm You can assign this issue to @kunalmohan. Apologies for not being able to contribute to it.

@kunalmohan
Copy link
Collaborator

@kunalmohan kunalmohan commented Jan 9, 2020

I just wanted to run through my understanding of the readasArrayBuffer method.

// step 1
let blob_contents = FileReaderSync::get_blob_bytes(blob)?;
// step 2
unsafe {
rooted!(in(*cx) let mut array_buffer = ptr::null_mut::<JSObject>());
assert!(ArrayBuffer::create(
*cx,
CreateWith::Slice(&blob_contents),
array_buffer.handle_mut()
)
.is_ok());
Ok(NonNull::new_unchecked(array_buffer.get()))

line 93: get blob contents as Vec<u8>
97: create a mutable null pointer (I am not sure what null_mut::<JSObject>() does) in the JSContext accepted as the parameter.
98: create a new arraybuffer object and assert its success.
105: return Nonnull type pointer to newly created arraybuffer(JSObject).
please explain the line 97 and correct any other mistakes I have made in the understanding of the method

Also how should the method of processing blob contents in ArrayBuffer be different from the readasArrayBuffer method?

@jdm
Copy link
Member Author

@jdm jdm commented Jan 9, 2020

line 97 is calling ptr::null_mut() and using an explicit type annotation to ensure that we're getting a *mut JSObject pointer that is null.

I'm not sure I understand your second question, however. In the initial synchronous implementation, I believe we will be creating an ArrayBuffer the exact same way.

@kunalmohan
Copy link
Collaborator

@kunalmohan kunalmohan commented Jan 9, 2020

Yes, that's what I was asking for now, the synchronous implementation. Thanks!

@kunalmohan
Copy link
Collaborator

@kunalmohan kunalmohan commented Jan 11, 2020

  • extract read_file into two separate functions that are called by read_file:

Can you explain how the read_file function works?

  • one that accepts &GlobalScope and Uuid arguments and returns an IpcReceiver<profile_traits::FileManagerResult<ReadFileProgress>>
  • one that accepts an IpcReceiver<profile_traits::ReadFileProgress> and returns Result<Vec<u8>, ()>

Also, shouldn't the second function here accept IpcReceiver<profile_traits::FileManagerResult<ReadFileProgress>> as parameter?

@jdm
Copy link
Member Author

@jdm jdm commented Jan 11, 2020

read_fike sends a message to the file handler thread (https://github.com/servo/servo/blob/master/components/net/filemanager_thread.rs#L424) containing a sender, and it reads from the corresponding receiver to get the content of the blob in sequential messages.

And I believe you are right about your second question.

@kunalmohan
Copy link
Collaborator

@kunalmohan kunalmohan commented Jan 12, 2020

I believe I have successfully implemented the read_file function (as most of the tests are now passing).

I have also implemented the read_file_async function but the types are yet to be made thread-safe. I'll need some guidance here (I'm working with threads for the first time). I know that Arc<T> is thread-safe and should be used instead of Rc<T>. And then RefCell<T>, Cell<T> and DomRefCell<T> have to be taken care of. I guess Mutex<T> would work here.
(I might be wrong in my approach above 🤔 )

Also the read_file_async function should return the TrustedPromise, right?

@jdm
Copy link
Member Author

@jdm jdm commented Jan 12, 2020

I think it's going to be hard to provide any feedback about the previous comment without seeing the code. Please push it to a branch on your fork and link it if you would like any particular feedback at this point :)

I don't think read_file_async needs to return the TrustedPromise. The caller already has a Rc<Promise> value that it can clone if it needs access to the promise.

@jdm jdm closed this Jan 12, 2020
@jdm jdm reopened this Jan 12, 2020
@kunalmohan
Copy link
Collaborator

@kunalmohan kunalmohan commented Jan 12, 2020

I have pushed the changes to my fork (branch 24287-BlobAPI).

Please check the read_file_async function definition (link below)
https://github.com/kunalmohan/servo/blob/d081e39adb5f74ef3f794587655557c8c34f1f5f/components/script/dom/globalscope.rs#L1251-L1260

@jdm
Copy link
Member Author

@jdm jdm commented Jan 13, 2020

@kunalmohan Ok, here's the design we need to follow for the async read:

  • convert the promise into a TrustedPromise before spawning the new thread
  • add a &GlobalScope argument, and use the task_canceller and file_reading_task_source APIs to get task source and canceller objects (eg.
    let canceller = global.task_canceller(TaskSourceName::FileReading);
    let task_source = global.file_reading_task_source();
    )
  • use the task source to queue a task (using the queue_with_canceller API) inside of the thread after reading the blob's bytes (eg.
    let _ = task_source.queue_with_canceller(
    task!(process_image_response_for_environment_change: move || {
    let element = element.root();
    // Ignore any image response for a previous request that has been discarded.
    if generation == element.generation.get() {
    element.process_image_response_for_environment_change(image,
    USVString::from(selected_source_clone), generation, selected_pixel_density);
    }
    }),
    &canceller,
    ) - this task will root() the TrustedPromise to turn it back into an Rc<Promise>, and then use resolve_native to resolve the promise with the blob's bytes

This removes the need to wait for the thread to finish running, and passes the promise object along through the thread until it can queue a task for the original thread to run that resolves the promise object. Does that make sense?

@kunalmohan
Copy link
Collaborator

@kunalmohan kunalmohan commented Jan 13, 2020

Yes, I think I should be able to implement it now.
Just a small query- since the read file functions are now implemented for struct GlobalScope, is there a need to separately take &GlobalScope parameter?

@jdm
Copy link
Member Author

@jdm jdm commented Jan 13, 2020

Ah, no. I missed that; good catch!

@kunalmohan
Copy link
Collaborator

@kunalmohan kunalmohan commented Jan 14, 2020

How do I ensure thread safety for RefCell<T>, Cell<T>, UnsafeCell<T>, Rc<T>? Should I use wrapper struct around them and implement Sync and Send traits for the wrapper struct? Or are there similar atomic wrappers for the above types (like Arc<T>)?

error[E0277]: `std::cell::Cell<()>` cannot be shared between threads safely
    --> components/script/dom/globalscope.rs:1257:23
     |
1257 |         let handler = thread::spawn( move || {
     |                       ^^^^^^^^^^^^^ `std::cell::Cell<()>` cannot be shared between threads safely
     |
     = help: within `dom::globalscope::GlobalScope`, the trait `std::marker::Sync` is not implemented for `std::cell::Cell<()>`
     = note: required because it appears within the type `std::marker::PhantomData<std::cell::Cell<()>>`
     = note: required because it appears within the type `ipc_channel::platform::unix::OsIpcSender`
     = note: required because it appears within the type `ipc_channel::ipc::IpcSender<script_traits::TimerSchedulerMsg>`
     = note: required because it appears within the type `dom::globalscope::GlobalScope`
     = note: required because of the requirements on the impl of `std::marker::Send` for `&dom::globalscope::GlobalScope`
     = note: required because it appears within the type `[closure@components/script/dom/globalscope.rs:1257:38: 1266:10 self:&dom::globalscope::GlobalScope, recv:profile_traits::ipc::IpcReceiver<std::result::Result<net_traits::filemanager_thread::ReadFileProgress, net_traits::filemanager_thread::FileManagerThreadError>>, bytes:std::vec::Vec<u8>, task_source:task_source::file_reading::FileReadingTaskSource, trusted_promise:dom::bindings::refcounted::TrustedPromise, canceller:task::TaskCanceller]`

(26 similar errors are encountered right now)

@jdm
Copy link
Member Author

@jdm jdm commented Jan 14, 2020

That happens when you reference self inside of the spawned thread. We need to make local variables of the values that are needed before spawning the thread and reference those, instead.

@kunalmohan kunalmohan mentioned this issue Jan 14, 2020
4 of 4 tasks complete
bors-servo added a commit that referenced this issue Jan 14, 2020
Implement Blob methods (text/arraybuffer)

<!-- Please describe your changes on the following line: -->
#24287 (comment)
#24287 (comment)

r?@jdm

---
<!-- Thank you for contributing to Servo! Please replace each `[ ]` by `[X]` when the step is complete, and replace `___` with appropriate data: -->
- [X] `./mach build -d` does not report any errors
- [X] `./mach test-tidy` does not report any errors
- [X] These changes fix #24287  (GitHub issue number if applicable)

<!-- Either: -->
- [X] There are tests for these changes

<!-- Also, please make sure that "Allow edits from maintainers" checkbox is checked, so that we can help you if you get stuck somewhere along the way.-->

<!-- Pull requests that do not address these steps are welcome, but they will require additional verification as part of the review process. -->
bors-servo added a commit that referenced this issue Jan 28, 2020
Implement Blob methods (text/arraybuffer)

<!-- Please describe your changes on the following line: -->
#24287 (comment)
#24287 (comment)

r?@jdm

---
<!-- Thank you for contributing to Servo! Please replace each `[ ]` by `[X]` when the step is complete, and replace `___` with appropriate data: -->
- [X] `./mach build -d` does not report any errors
- [X] `./mach test-tidy` does not report any errors
- [X] These changes fix #24287  (GitHub issue number if applicable)

<!-- Either: -->
- [X] There are tests for these changes

<!-- Also, please make sure that "Allow edits from maintainers" checkbox is checked, so that we can help you if you get stuck somewhere along the way.-->

<!-- Pull requests that do not address these steps are welcome, but they will require additional verification as part of the review process. -->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

4 participants
You can’t perform that action at this time.