Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding Request/AvailableDAGs APIs #68

Merged
merged 7 commits into from
Jun 7, 2023
Merged

Conversation

plauche
Copy link
Collaborator

@plauche plauche commented Jun 5, 2023

Adds the RequestAvailableDags API, and association between filenames and DAGs.

@plauche plauche force-pushed the feat/request-available-dags branch from 8f4485c to 299bbb0 Compare June 6, 2023 15:39
@plauche plauche changed the title Adding Get/AvailableDAGs APIs Adding Request/AvailableDAGs APIs Jun 6, 2023
@plauche plauche marked this pull request as ready for review June 6, 2023 15:44
@@ -30,6 +31,7 @@ impl fmt::Debug for StoredBlock {

f.debug_struct("StoredBlock")
.field("cid", &cid_str)
.field("filename", &self.filename)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like you're storing the blocks in sqlite not in individual files (and if you were I would hope you could derive the filename from the CID).

So I'm guessing this filename member is actually from a UnixFS directory's link to this block? e.g. the directory entry's name for it?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The filename originates in this API message ApplicationAPI::ImportFile { path: String }. We just pull out the "file name" piece of the path in Storage::import_path.

@@ -68,7 +69,11 @@ impl Storage {
}
}
if let Some(root_cid) = root_cid {
if let Some(filename) = path.file_name().and_then(|p| p.to_str()) {
self.provider.name_dag(&root_cid, filename)?;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, interesting - you're naming root CIDs. I think Pinata does this IIRC (but doesn't really expose it via gateway).

window_size: Option<u32>,
) -> Result<Vec<StoredBlock>> {
let mut base_query = "
WITH RECURSIVE cids(x,y,z) AS (
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was not familiar with WITH RECURSIVE. That's interesting.
How is performance? Is this function (regardless of how it's implemented) a memory limitation concern?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't run into any show stopping performance issues with files in the 5-50 mB range, but also haven't done much memory profiling around larger files here. I think we are Ok for the smaller files scheduled for usage in the demonstration.

}

impl StorageProvider for SqliteStorageProvider {
fn import_block(&self, block: &StoredBlock) -> Result<()> {
self.conn.execute(
"INSERT OR IGNORE INTO blocks (cid, data) VALUES (?1, ?2)",
(&block.cid, &block.data),
"INSERT OR IGNORE INTO blocks (cid, data, filename) VALUES (?1, ?2, ?3)",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "or ignore" makes me immediately question what the PK and/or unique constraints are. And looking at that code leads me to a side question: The purpose of an autonumber-style "id"?

But back to the main question: it looks like you did not modify the unique index on CID. So if you add a second file with the same content it gets silently dropped completely - no record of that filename existing. Is this really desired?

Perhaps it's too academic a concern, but in a pure mathematical sense a filename probably should't be a column in blocks, right? It would be a separate table that associates... I guess many-to-many?

})?
// TODO: Correctly catch/log/handle errors here
.filter_map(|cid| cid.ok())
.collect();
Ok(roots)
}

fn name_dag(&self, cid: &str, file_name: &str) -> Result<()> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conversely this method assumes you want to drop the old name.

I would've guessed there'd be a parameter, perhaps bool or bool-equivalent-enum, specifying whether you'd like to do this or do WHERE cid = ?2 AND filename = NULL

)?;
Ok(())
}

fn get_missing_cid_blocks(&self, cid: &str) -> Result<Vec<String>> {
// First get all block cid+id associated with root cid
let blocks: Vec<(String, Option<i32>)> = self
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious that you don't want to use a recursive select here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went ahead and fixed this for consistency and in case it becomes an issue later

@@ -103,6 +116,11 @@ pub enum ApplicationAPI {
Version {
version: String,
},
RequestAvailableDags,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although I assume it wouldn't be a concern here, are there any cross-version compatibility concerns with changes to this enum? Like, is it append-only for new enum variants or whatever?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently there is no assumed cross-version compatibility for these enums. For the purposes of the initial demonstrations we are assuming that all sides are running the same version, and that we have the control to make that so.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A future enhancement along the path to SDK-ifying this software would be building more cross-version compatibility into these enums.

@@ -213,6 +213,9 @@ impl<T: Transport + Send + 'static> Listener<T> {
}
None
}
Message::ApplicationAPI(ApplicationAPI::RequestAvailableDags) => {
Some(handlers::get_available_dags(self.storage.clone())?)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunate that this sort of mapping can't be generated.

@@ -418,6 +418,7 @@ impl<T: Transport + Send + 'static> Shipper<T> {
cid: Cid::try_from(block.cid)?.to_string(),
data: block.data,
links,
filename: block.filename,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may very well be misunderstanding the purpose of this mod, but are the various satellites & ground stations supposed to agree on the filenames of roots?

Copy link
Collaborator Author

@plauche plauche Jun 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is to give recognizable names to DAGs/root CIDs for recognition by operators and/or observers. Maintaining the same "file name" across satellites & ground stations will hopefully help folks recognize the completion of a DAG transfer.

Copy link
Collaborator

@John-LittleBearLabs John-LittleBearLabs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do think at some point we should pick one way handle conflicting names (old wins silently, new wins silently, error) and use the same approach in both import_dag and also name_dag.

But if, for right now, the answer is "it doesn't matter which happens", then this is fine. We could, for example, simply ask that they never use a file with the same content as another file they've used.

@plauche
Copy link
Collaborator Author

plauche commented Jun 7, 2023

I do think at some point we should pick one way handle conflicting names (old wins silently, new wins silently, error) and use the same approach in both import_dag and also name_dag.

But if, for right now, the answer is "it doesn't matter which happens", then this is fine. We could, for example, simply ask that they never use a file with the same content as another file they've used.

Sorry, I didn't mean to ignore your two comments on this! Yes I agree that it should be addressed at some point. I created #69 to address this in the future.

@plauche plauche merged commit 07e56f4 into main Jun 7, 2023
@plauche plauche deleted the feat/request-available-dags branch June 7, 2023 15:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants