Skip to content
This repository has been archived by the owner on Jul 18, 2024. It is now read-only.

[DataCap Application] Genome Ark DataSet #1068

Closed
beck-8 opened this issue Oct 12, 2022 · 64 comments
Closed

[DataCap Application] Genome Ark DataSet #1068

beck-8 opened this issue Oct 12, 2022 · 64 comments

Comments

@beck-8
Copy link

beck-8 commented Oct 12, 2022

Large Dataset Notary Application

To apply for DataCap to onboard your dataset to Filecoin, please fill out the following.

Core Information

  • Organization Name: Public Genome Ark dataset
  • Website / Social Media: https://vertebrategenomesproject.org/
  • Total amount of DataCap being requested (between 500 TiB and 5 PiB): 5PiB
  • Weekly allocation of DataCap requested (usually between 1-100TiB): 200TiB
  • On-chain address for first allocation: f1i64dht6jd6blfbxmt4xwss4wvf37kn7wjaxe3ui

Please respond to the questions below by replacing the text saying "Please answer here". Include as much detail as you can in your answer.

Project details

Share a brief history of your project and organization.

I am a technologist and have participated in many tests during the early testnet of filecoin. In each network upgrade, leads the team to fight on the front line, and has a deep understanding and senior experience in the technical side of filecoin. Now come to participate in the LDN event to help filecoin store more real data and make web3 storage real.

What is the primary source of funding for this project?

Self-funded,my personal partner.

What other projects/ecosystem stakeholders is this project associated with?

No.

Use-case details

Describe the data being stored onto Filecoin

Near error-free reference genome assemblies of extant vertebrate species, a complete list of extant vertebrate species.

Where was the data in this dataset sourced from?

https://vertebrategenomesproject.org/
the bucket is arn:aws:s3:::genomeark

Can you share a sample of the data? A link to a file, an image, a table, etc., are good ways to do this.

urls:
https://genomeark.s3.us-east-1.amazonaws.com/species/Acanthisitta_chloris/bAcaChl1/genomic_data/10x/bAcaChl1_S1_L004_R1_001.fastq.gz
https://genomeark.s3.us-east-1.amazonaws.com/species/Acanthisitta_chloris/bAcaChl1/genomic_data/10x/bAcaChl1_S1_L004_R2_001.fastq.gz
https://genomeark.s3.us-east-1.amazonaws.com/species/Accipiter_gentilis/bAccGen1/assembly_cambridge/bAccGen1.alt.asm.20211029.fasta.gz
https://genomeark.s3.us-east-1.amazonaws.com/species/Accipiter_gentilis/bAccGen1/assembly_cambridge/bAccGen1.pri.asm.20211029.fasta.gz

or use bucket
arn:aws:s3:::genomeark

Confirm that this is a public dataset that can be retrieved by anyone on the Network (i.e., no specific permissions or access rights are required to view the data).

Yes, it is AWS open dataset.

What is the expected retrieval frequency for this data?

Not very often.

For how long do you plan to keep this dataset stored on Filecoin?

532 days

DataCap allocation plan

In which geographies (countries, regions) do you plan on making storage deals?

North america; Korea; China;Singapore.There are all other possible regions.

How will you be distributing your data to storage providers? Is there an offline data transfer process?

Storage providers that are farther away use online transfers, and those that are closer use offline transfers.

How do you plan on choosing the storage providers with whom you will be making deals? This should include a plan to ensure the data is retrievable in the future both by you and others.

I will combine my previous experience and look for some SPs with high reputation and stability.I have contacted some storage providers.

How will you be distributing deals across storage providers?

If they can receive my transactions, I will distribute them evenly and find as many SPs as possible to seal them, making the data more widely distributed.

Do you have the resources/funding to start making deals as soon as you receive DataCap? What support from the community would help you onboard onto Filecoin?

I have all I need to start making deals.
@large-datacap-requests
Copy link

Thanks for your request!

Heads up, you’re requesting more than the typical weekly onboarding rate of DataCap!

@large-datacap-requests
Copy link

Thanks for your request!
Everything looks good. 👌

A Governance Team member will review the information provided and contact you back pretty soon.

@simonkim0515
Copy link
Collaborator

Datacap Request Trigger

Total DataCap requested

5PiB

Expected weekly DataCap usage rate

200TiB

Client address

f1i64dht6jd6blfbxmt4xwss4wvf37kn7wjaxe3ui

@large-datacap-requests
Copy link

large-datacap-requests bot commented Nov 28, 2022

DataCap Allocation requested

Multisig Notary address

f02049625

Client address

f1i64dht6jd6blfbxmt4xwss4wvf37kn7wjaxe3ui

DataCap allocation requested

100TiB

Id

21258dd7-4543-46a9-8040-bd24fc3c5c7e

@kernelogic
Copy link

This dataset has a non-standard usage term, need to make sure you are free to distribute it.

https://genome10k.ucsc.edu/data-use-policies/

@beck-8
Copy link
Author

beck-8 commented Nov 30, 2022

This dataset has a non-standard usage term, need to make sure you are free to distribute it.

https://genome10k.ucsc.edu/data-use-policies/

Thanks for checking.
Before applying, I have fully checked the terms of use of the data and the time when the data was generated, and the current application fully meets the conditions.
If you feel that there is a problem, please continue to contact me.

@newwebgroup
Copy link

Hey @beck-8
The content looks fine,
What SPs have you found for cooperation?

@beck-8
Copy link
Author

beck-8 commented Dec 1, 2022

Hey @beck-8 The content looks fine, What SPs have you found for cooperation?

Hello, I plan to store to SP first (f01900525, f01155, f01971600), I am looking for more SP to store, and then continue to sync to here.

@kernelogic
Copy link

This dataset is not using the usual open source licenses. I think you need to contact them for explicit re-distribution approval.

image

@beck-8
Copy link
Author

beck-8 commented Dec 1, 2022

This dataset is not using the usual open source licenses. I think you need to contact them for explicit re-distribution approval.

image

Hello, our data is all filtered data, and the rules are as follows.
image
Relevant information is as follows

In addition, it can be seen from the filecoin official here that this part of the data has already been stored in the network, which is no problem.
https://filecoin.io/zh-cn/blog/posts/filedrive/

@kernelogic
Copy link

I remember this dataset was indeed part of slingshot v2. Proceed.

Copy link

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzaceb7tzrwap27oq5cjbk5r4jquyzkwzwrplieynjnqmevuwyvc3w2ce

Address

f1i64dht6jd6blfbxmt4xwss4wvf37kn7wjaxe3ui

Datacap Allocated

100.00TiB

Signer Address

f1yjhnsoga2ccnepb7t3p3ov5fzom3syhsuinxexa

Id

21258dd7-4543-46a9-8040-bd24fc3c5c7e

You can check the status of the message here: https://filfox.info/en/message/bafy2bzaceb7tzrwap27oq5cjbk5r4jquyzkwzwrplieynjnqmevuwyvc3w2ce

Copy link

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzacedapb4zsvhp3h5ej7qqux7bih4xlm7bpevboygvh6z2jszixajsiw

Address

f1i64dht6jd6blfbxmt4xwss4wvf37kn7wjaxe3ui

Datacap Allocated

100.00TiB

Signer Address

f1e77zuityhvvw6u2t6tb5qlnsegy2s67qs4lbbbq

Id

21258dd7-4543-46a9-8040-bd24fc3c5c7e

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacedapb4zsvhp3h5ej7qqux7bih4xlm7bpevboygvh6z2jszixajsiw

@Carohere
Copy link

checker:manualTrigger

@filplus-checker-app
Copy link

DataCap and CID Checker Report Summary1

Storage Provider Distribution

⚠️ 1 storage providers sealed more than 30% of total datacap - f01697248: 34.28%

Deal Data Replication

⚠️ 69.06% of deals are for data replicated across less than 4 storage providers.

Deal Data Shared with other Clients2

✔️ No CID sharing has been observed.

Full report

Click here to view the full report.

Footnotes

  1. To manually trigger this report, add a comment with text checker:manualTrigger

  2. To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

@Patapon0702
Copy link

checker:manualTrigger

@filplus-checker-app
Copy link

DataCap and CID Checker Report Summary1

Storage Provider Distribution

⚠️ 1 storage providers sealed more than 30% of total datacap - f01697248: 39.02%

Deal Data Replication

⚠️ 73.32% of deals are for data replicated across less than 4 storage providers.

Deal Data Shared with other Clients2

✔️ No CID sharing has been observed.

Full report

Click here to view the full report.

Footnotes

  1. To manually trigger this report, add a comment with text checker:manualTrigger

  2. To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

@Carohere
Copy link

Carohere commented Jun 9, 2023

checker:manualTrigger

@filplus-checker-app
Copy link

DataCap and CID Checker Report Summary1

Retrieval Statistics

  • Overall Graphsync retrieval success rate: 23.18%
  • Overall HTTP retrieval success rate: 0.00%
  • Overall Bitswap retrieval success rate: 0.00%

Storage Provider Distribution

⚠️ 1 storage providers sealed more than 30% of total datacap - f01697248: 39.02%

Deal Data Replication

⚠️ 73.32% of deals are for data replicated across less than 4 storage providers.

Deal Data Shared with other Clients2

✔️ No CID sharing has been observed.

Full report

Click here to view the CID Checker report.
Click here to view the Retrieval report.

Footnotes

  1. To manually trigger this report, add a comment with text checker:manualTrigger

  2. To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

@github-actions
Copy link

This application has not seen any responses in the last 10 days. This issue will be marked with Stale label and will be closed in 4 days. Comment if you want to keep this application open.

@github-actions github-actions bot added the Stale label Jul 22, 2023
@github-actions
Copy link

This application has not seen any responses in the last 14 days, so for now it is being closed. Please feel free to contact the Fil+ Gov team to re-open the application if it is still being processed. Thank you!

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jul 26, 2023
Copy link

Thanks for your request!
❗ We have found some problems in the information provided.
We could not find Website / Social Media field in the information provided
We could not find Total amount of DataCap being requested (between 500 TiB and 5 PiB) field in the information provided
We could not find Weekly allocation of DataCap requested (usually between 1-100TiB) field in the information provided
We could not find On-chain address for first allocation field in the information provided
We could not find Data Type of Application field in the information provided

Please, take a look at the request and edit the body of the issue providing all the required information.

Copy link

RootKeyHolders have approved multisig account. You can now request first datacap release

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests