## About last time

- Data partitioning is a crucial step when performing experiments $\rightarrow$ *biases in evaluation are right behind the corner*

- Data leakage covers a wide spectrum of possible erros, including preprocessing and data partitioning $\rightarrow$ **hard to detect leakage sometimes!**

- Seeding control experimental stochasticity, but in many cases its use can be harmful $\rightarrow$ *yet another hyper-parameter*

- Statistical significance is a powerful tool that is unfortunately erroneously abused! $\rightarrow$ **there is a huge lack of education on this topic**

- Metrics are a standard tool for system evaluation. However they only provide a specific point of view of a problem $\rightarrow$ **make sure you consider different metrics and do not copy-paste results from literature**

## About this lecture

We talked about data collection and doing experiments, and corresponding fallacies.

While there are several issues we might encounter, there is also effort in developing solutions to mitigate these issues.

One of these solutions in accordance with reproducible research is represented by **recommendations checklists**.

## But first...let's hear it from you!

Time for another [Google Form](https://forms.gle/tVR38jUqNsrzigCN9) (**5 mins**)

<center>
<div>
<img src="../Images/Lecture-5/qsn_checklists.png" width="600" alt='Recommendation checklists'/>
</div>
</center>

## Data guidelines and recommendations

One recommendation for evaluating aforementioned issues is via datasheets for datasets [[Gebru et al., 2021](https://cacm.acm.org/magazines/2021/12/256932-datasheets-for-datasets/abstract)]

For dataset **creators**:

- Creation process $\rightarrow$ reproducing, creating a new one..
- Distribution
- Maintenance
- Assumptions
- Risks, harms, implications of use, biases, limitations

For dataset **consumers**:

- Information to make informed decisions about using the dataset
- Transparency for data selection
- Avoid unintentional misuse

May be valuable to policy makers, consumer advocates, investigative journalists, individuals whose data is included in datasets, and individuals who may be impacted by models trained or evaluated using datasets

Datasheets also facilitate reproducibility: researchers and practitioners without access to a dataset may be able to use the information in its datasheet to create alternative datasets with similar characteristics.

Datasheets are **not** meant to be prescriptive

$\rightarrow$ datasheets will necessarily vary depending on factors such as the domain or existing organizational infrastructure and workflows. 

<center>
<div>
<img src="../Images/Lecture-5/datasheet1.png" width="1500" alt='datasheet1'/>
</div>
</center>

<center>
<div>
<img src="../Images/Lecture-5/datasheet2.png" width="1000" alt='datasheet2'/>
</div>
</center>

<center>
<div>
<img src="../Images/Lecture-5/datasheet3.png" width="1000" alt='datasheet3'/>
</div>
</center>

<center>
<div>
<img src="../Images/Lecture-5/datasheet4.png" width="1000" alt='datasheet4'/>
</div>
</center>

## A concrete example

Check ''*Other material/Lecture-5/argscichat_datasheet.pdf*"

## Data Cards [[Pushkarna et al., 2022]](https://dl.acm.org/doi/pdf/10.1145/3531146.3533231)

A clear and thorough understanding of a dataset’s origins, development, intent, ethical considerations and evolution becomes a necessary step for the responsible and informed deployment of models, especially those in people-facing contexts and high-risk domains.

$\rightarrow$ falls on the intelligibility, conciseness, and comprehensiveness of the documentation!

Data Cards are structured summaries of essential facts about various aspects of ML datasets needed by stakeholders across a dataset’s lifecycle for responsible AI development

- provide explanations of processes and rationales that shape the data and corresponding up-stream processes
- trainining and evaluation methods
- intended use of data (*recall annotation paradigms!*)
- decisions affecting model performance

### Data cards principles

The authors develop guidelines for the successful and appropriate adoption of Data Cards in practice and at scale

#### Consistent

Data Cards must be comparable to one another, regardless of data modality or domain such that claims are easy to interpret and validate within context of use. 

A Data Card creation effort should solicit equitable information from all datasets.

#### Comprehensive

Rather than being created as a last step in a dataset’s lifecycle, it should be easy to create a Data Card concurrently with the dataset.


This requires standardized methods that extend beyond the Data Card, and apply to the various reports generated in the dataset’s lifecycle.

#### Intelligible and Concise

A Data Card should efficiently communicate to the reader with the least proficiency, while enabling readers with greater proficiency to find more information as needed. 

The content and design should advance a reader’s deliberation process without overwhelming them, and encourage stakeholder cooperation towards a shared mental model of the dataset for decision-making.

#### Explainability, Uncertainty

Clear descriptions and justifications for uncertainty can lead to additional measures to mitigate risks, leading to opportunities for fairer and equitable models. 

This builds greater trust in the dataset and subsequently, its publishers.

<center>
<div>
<img src="../Images/Lecture-5/datacard-template.png" width="1000" alt='datacard template'/>
</div>
</center>

<center>
<div>
<img src="../Images/Lecture-5/datacard-themes.png" width="1000" alt='datacard themes'/>
</div>
</center>

### A concrete example

Check ''*Other material/Lecture-5/datacards.pdf*"

## Model Cards [[Mitchell et al., 2019](https://dl.acm.org/doi/10.1145/3287560.3287596)]

Similarly to datasheets for datasets, there exist **datasheets for models**.

Offer a **middle ground** for peer-reviewing processes: code checking is a time-consuming process.

**Objective**: To clarify the intended use cases of machine learning models and minimize their usage in contexts for which they are not well suited.

Focus on trained model performance characteristics: 

- Type of model
- Intended use cases
- Attributes for which performance may vary
- Measures of model performance (their motivation)
- Ethical considerations
- Target users

Useful to understand the **systematic impacts** of models before their deployment.

Model cards are short documents accompanying trained machine learning models that provide benchmarked evaluation in a variety of conditions, such as across 

- different cultural, demographic, or phenotypic groups (e.g., race, geographic location, sex, etc..) 
- intersectional groups (e.g., age and race, or sex and) 

that are relevant to the intended application domains.

Model cards aim to standardize ethical practice and reporting.

$\rightarrow$ allowing stakeholders to compare candidate models for deployment across traditional evaluation metrics, ethical, inclusive, and fair considerations.

### Use cases

Model reporting will hold different meaning to those involved in different aspects of model development, deployment, and use.

#### ML and AI practitioners 
They can better understand how well the model might work for the intended use cases and track its performance over time.

#### Model developers

They can compare the model’s results to other models in the same space, and make decisions about training their own system.

#### Software developers

Software developers working on products that use the model’s predictions can inform their design and implementation decisions.

#### Policymakers

They can understand how a machine learning system may fail or succeed in ways that impact people.

#### Organizations

They can inform decisions about adopting technology that incorporates machine learning.

#### ML-knowledgeable individuals

They can be informed on different options for fine-tuning, model combination, or additional rules and constraints to help curate models for intended use
cases without requiring technical expertise.

#### Impacted individuals

Who may experience effects from a model can better understand how it works or use information in the card to pursue remedies.

<center>
<div>
<img src="../Images/Lecture-5/model-card.png" width="450" alt='model-card'/>
</div>
</center>

<center>
<div>
<img src="../Images/Lecture-5/model-card-example.png" width="800" alt='model-card-example'/>
</div>
</center>

## Model info sheets [[Kapoor and Narayanan, 2022]](https://www.sciencedirect.com/science/article/pii/S2666389923001599)

Inspired by Model cards [[Mitchell et al., 2019](https://dl.acm.org/doi/10.1145/3287560.3287596)]

A model info sheet requires the researcher to provide precise arguments to justify that models used for making scientific claims do not suffer from leakage, by answering 21 questions based on their taxonomy of leakage.

### Limitations

- **Cannot be verified** in the **absence** of computational reproducibility (code availability!)
- **Incorrect claims** in the datasheet $\rightarrow$ false assurance (*I'm only human after all...*)

### Model info sheet template

Check ''*Other material/Lecture-5/model-info-sheet-template.docx*"

## FactSheets [[Arnold et al., 2019](https://ieeexplore.ieee.org/document/8843893)]

Considerations beyond accuracy, such as safety (which includes fairness and explainability), security, and provenance, are also critical elements to engender consumers’ trust in a service.

FactSheets are documents to contain purpose, performance, safety, security, and provenance information to be completed by AI service providers for examination by consumers.

A FactSheet is modeled after a supplier’s declaration of conformity (SDoC). 

An SDoC is a document to “show that a product, process or service conforms to a standard or technical regulation, in which a supplier provides written assurance of conformity to the specified requirements”

$\rightarrow$ used in many different industries and sectors including telecommunications and transportation

$\rightarrow$ SDoCs are often voluntary and tests reported in SDoCs are conducted by the supplier itself rather than by third parties.

In summary, FactSheets cover several aspects.

#### Statement of Purpose

- General: who is the supplier, what is the service about, output of the service, what algorithms, etc.
- Usage: what is the intended use of the service output
- Domains and Applications: was tested on or used for, how is the service being used

#### Basic Performance

- Testing the service provider: which datasets, which testing methodology, which results
- Testing by third parties: way to verify performance metrics, tested by any third party

#### Safety

- General: aware of biases, ethical issues, safety risks, informed consent
- Explainability: are the service output explainable/interpretable?
- Fairness: was each dataset checked for biases, any bias detection implemented
- Concept drift: expected performance on seen and unseen data, is the system updated with new data

#### Security

- How could the service be attacked
- How is data secured
- Is the service robust to adversarial attacks

#### Lineage

- Training data: as-is model availability, training data availability
- Trained models: model training details, model last update

### FactSheet Template

Check *''Other material/Lecture-5/factsheets.pdf"*

## Many other ways for reporting

Check JAIR paper for more related work

- Checklists [[Han et al., 2017](https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0183591)] $\rightarrow$ mainly for reviewing and community specific
- Data statements [[Bender & Friedman, 2018](https://aclanthology.org/Q18-1041.pdf)] $\rightarrow$ to allieve issues related to exclusion and bias in NLP
- The Dataset Nutrition Label [[Holland et al., 2018](https://arxiv.org/pdf/1805.03677.pdf)] $\rightarrow$ Guidelines for analyzing data structure and status

[❓] Do you see any drawback?

State that they are quite general on average $\rightarrow$ JAIR paper for a concrete example

Make questionnaire for evaluation

State that nobody is going to do use these recommendations

## Concluding Remarks


# Any questions?

<center>
<div>
<img src="../Images/Lecture-2/jojo-arrivederci.gif" width="1000" alt='JOJO_arrivederci'/>
</div>
</center>