Skip to content
eliboss edited this page Mar 29, 2021 · 1 revision

Our Flow of the project + how we should start to analyze the data:

① Find all values to compare in a data set

↳ if no values found, notify the user that there are no valid labels or images.

② For each set of values, group by: • possible gender for images • skin tone for images •For labeled data: -ethnicity -gender -religion -income -age -education

③ for each value, scrape //Somewhere// for, demographic data. For each group for a value:

↳ Check that the size of the group in the data is roughly proportional to the size of that demographic.
    >> Also >> Check that each group has a minimum count.

↳ if a group is underrepresented, Suggest adding more data from that group

↳ if a group is over rep. mark as overrepresented. 

** Think about how to implement intersectionality **

④ For numerical values: (ex: skin tone in hex coordinates) a) Create a range to group values by. b) find variance within each group and between groups.

    ↳ if variance within a group is high, !OR! the median is skewed to one side of the range, in this group needs more samples

    ↳ if variance within a group is low, !AND! the median is close to the middle of the range, this group of samples is representative of the range.

** Think about making ranges dynamic so that they can adapt to a data set. **

↳ If a data set has a range that does not include the full range possible, include that in the report.