Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add example of raw->spacetx format conversion for ISS Breast Data #464

Merged
merged 11 commits into from
Aug 23, 2018

Conversation

dganguli
Copy link
Collaborator

This PR:

  1. Introduces get_iss_breast_data.py, which converts raw data from s3://czi.starfish.data.public/browse/raw/20180820/iss_breast/ to spaceTx formatted data. This will be our first real multi-fov dataset.

  2. re-names things. image -> tile, image_fetcher -> tile_fetcher

I tested this PR in a notebook locally. I downloaded the 1 FOV from he breast data formatter and ran the ISS breast notebook against it. It worked.

Remaining work for @ambrosejcarr :

  1. Run this tool.
  2. Upload formatted data to s3 browsable.
  3. Copy codebook.json from 1 FOV dataset over to latter bucket.

Remaining work for @ttung :

  1. Get cracking on experiment API so that...

Remaining work for Deep:

  1. Use experiment API to analyze this entire dataset (blocked by @ttung )
  2. Generate copy number plot for call 2 (high priority @ttung )
  3. Start spec'ing out what multi-fov experiment processing will require (blocked by @ttung )

@codecov-io
Copy link

codecov-io commented Aug 22, 2018

Codecov Report

❗ No coverage uploaded for pull request base (master@1b4469c). Click here to learn what that means.
The diff coverage is 81.25%.

Impacted file tree graph

@@            Coverage Diff            @@
##             master     #464   +/-   ##
=========================================
  Coverage          ?   85.17%           
=========================================
  Files             ?       73           
  Lines             ?     3009           
  Branches          ?        0           
=========================================
  Hits              ?     2563           
  Misses            ?      446           
  Partials          ?        0
Impacted Files Coverage Δ
starfish/experiment/builder/__init__.py 92% <71.42%> (ø)
starfish/experiment/builder/imagedata.py 64.7% <88.88%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1b4469c...4fcea96. Read the comment docs.

return dict(zip(range(4), hyb_str))

def get_tile(self, fov: int, hyb: int, ch: int, z: int) -> FetchedTile:
filename = 'slideA_' + str(fov + 1) + '_' + \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

string concatenation is a huge mess and very hard to read. consider switching to f-style python strings.


@property
def ch_dict(self):
ch_str = ['Cy3 5', 'Cy3', 'Cy5', 'FITC']
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure why this is better than just a literal dict.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

who cares.


@property
def hyb_dict(self):
hyb_str = ['1st', '2nd', '3rd', '4th']
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ummmm, dict(enumerate(hyb_str))?

elif self.aux_type == 'dots':
filename = 'slideA_' + str(fov + 1) + '_DO_' + 'Cy3.TIF'
else:
msg = 'invalid aux type: {}'.format(self.aux_type)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahhh, string concatenation!



def format_data(input_dir, output_dir):
if not input_dir.endswith("/"):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these should not be necessary.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol i copied this from your code in get_iss.py. i can nuke it.

if not output_dir.endswith("/"):
output_dir += "/"

def add_codebook(experiment_json_doc):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmmm, i think we need a better way to build codebooks programmatically....

parser.add_argument("output_dir", type=FsExistsType())

args = parser.parse_args()
s3_bucket = "s3://czi.starfish.data.public/browse/raw/20180820/iss_breast/"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't get this. who's going to do this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ambrose is going to do this. josh is going to read this code. this is also an example to help collaborators understand how to convert their data.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this message is confusing. i would just have the --help output tell people where to get the data.

Rationale: the only time this message prints is if the command is running, and to do that, you kind of need the data already.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@ttung
Copy link
Collaborator

ttung commented Aug 23, 2018

Is it possible to generate a full experiment manifest from this?

@ambrosejcarr
Copy link
Member

ambrosejcarr commented Aug 23, 2018

@ttung I think this is a WIP that I'm supposed to take over. I was looking through the code yesterday and your question was my main question: "how does one pass a directory with multiple fields of view and expect them to be parsed separately? Do I write os.path.walk() into the TileFetcher?

A toy example of that would be very helpful, if you know how it's supposed to function. If that functionality is not supported, that would also be great to know, as I will try to add it.

When I'm finished using this tool to upload the s3 data, I will add a README to it.

@dganguli
Copy link
Collaborator Author

@ttung what do you mean is it possible to generate a full experiment manifest from this. That is the entire point. Unless you have a different definition of experiment manifest than I do? The problem, is that we don't have a way of loading this full experiment manifest. That's why we need the multi-fov API ASAP!

@ambrosejcarr parsing images that live in separate directories is easy; however, that's super unimportant to worry about now. regardless, the way you would handle it, is in TileFetcher.get_tile(...). You would implement the logic needed os.path.join self.input_dir to the right thing based on fov_num, hyb, ch, etc. a more interesting question, is what if you'd like build_experiment_json to write hybridization_fov_xxx.jsons to different directories.

@dganguli
Copy link
Collaborator Author

@ttung i addressed all ur comments in the PR. thx!

@ambrosejcarr
Copy link
Member

@dganguli Ok, so if I wanted to use this tool as is, I should put all the existing images that need to be formatted into the same directory, and it will write a series of hybridization.json files into a single output directory?

I was for some reason stuck on the prior that each FOV would live in its own directory and couldn't reconcile how this would work in that case. I think I'm straightened out.

@dganguli
Copy link
Collaborator Author

@ttung can u approve and merge? i'd like to get the corresponding merfish work done today as well.

crp = img[40:1084, 20:1410]
return crp

def tile_data_handle(self) -> IO:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think this is supposed to be a @property. it worked before due to a fluke, but the new slicedimage api will require it.

you can make the change now or i can make it when i land the slicedimage change.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will fix

parser.add_argument("output_dir", type=FsExistsType())

args = parser.parse_args()
s3_bucket = "s3://czi.starfish.data.public/browse/raw/20180820/iss_breast/"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this message is confusing. i would just have the --help output tell people where to get the data.

Rationale: the only time this message prints is if the command is running, and to do that, you kind of need the data already.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants