# Approaches

-----

### Imbalanced Rate
-- train:
- "good" samples: 1608 --> ~90%
- "bad" samples: 171   --> ~10%


----

### Standart Pipeline



#### 0. Data Visualization

```
 - Class distribution plot
 - PCA/tSNE (Autoencoder)
 - GrabCam for attention
```

#### 1. Imbalanced dataset handling


```
- Weight balancing
- Focal loss for handling false positive/negatives
- Oversampling minority class by data augmentation
```
 
 
 
#### 2.Image Pre-processing

```
 - Contour detection
 - Edge detection
 - (color adjustments)
 - Data Augmentation (cutout, cut mix , mixup, autoaugment)
 ```
 
#### 4. Training

```
 - Print a couple of model outputs
 - Train subset with simple and advanced model
 ```
 ```
 - Transfer learning (finetuning) 
 - Cross-validation
 ```
 
 
#### 5. Parameter Tuning

 ```
 - Parameter search (Keras tuner, grid, random)
 - Learning rate scheduler (exp, time..)
 - Regularization (L1/L2, drop-out, batch norm)
 - Selection of loss (focal loss)
 - Selection of optimizer
 ```

#### 6. Evaluation

```
 - check initial loss not greater than 1
 
      0s: %10
      1s: %90
      -0.1.ln(0.5) - 0.9ln(0.5) = -0.693
      
 - precision/recall curve
 - ROC/AUC curve
 - confusion matrix
 ```

---

### Special Approaches

 
#### Chaning classifier head

 ```
 - Extract features with a pretrained model 
 - Use SVM/decision tree for classifying
 ```

#### Unspervised learning

```
- SimCLR
- k-means clustering with PCA
```

#### Similarity Learning

```
- Siamese Networks
```

---

In [None]:
# Suppose we have some input data describing a graph of relationships between parents and children over multiple generations. The data is formatted as a list of (parent, child) pairs, where each individual is assigned a unique positive integer identifier.

# For example, in this diagram, 3 is a child of 1 and 2, and 5 is a child of 4:

# 1   2    4   15
#  \ /   / | \ /
#   3   5  8  9
#    \ / \     \
#     6   7    11


# Sample input/output (pseudodata):

# parent_child_pairs = [
#     (1, 3), (2, 3), (3, 6), (5, 6), (15, 9),
#     (5, 7), (4, 5), (4, 8), (4, 9), (9, 11)
# ]


# Write a function that takes this data as input and returns two collections: 

    # one containing all individuals with zero known parents, and 
    # one containing all individuals with exactly one known parent.


# Output may be in any order:

# find_nodes_with_zero_and_one_parents(parent_child_pairs) => [
#   [1, 2, 4, 15],       # Individuals with zero parents
#   [5, 7, 8, 11]        # Individuals with exactly one parent
# ]

# n: number of pairs in the input

parent_child_pairs = [
    (1, 3), (2, 3), (3, 6), (5, 6), (15, 9),
    (5, 7), (4, 5), (4, 8), (4, 9), (9, 11)]

def findParentCases(parent_child_pairs):
    
    parentCounter = {}
    # parentCounter = {"1":1, "3":2, "2":3,"5":2,...}
    
    
    for inx, pair in enumerate(parent_child_pairs):
        
        parent = pair[0] #3
        child = pair[1]  #6
        
        if parent not in parentCounter:
            parentCounter[parent] = 1
        else:
            parentCounter[parent] += 1
            
        if child not in parentCounter:
            parentCounter[child] = 1 
        else:
            parentCounter[child] += 1
            
        #inx=0 --> parentCounter = {"1":1,"3":1}
        #inx=1 --> parentCounter = {"1":1,"3":2,"2":1}
        #inx=2 --> parentCounter = {"1":1,"3":1,"2":1,"6":1}
        
        
    
    zeroParent = []
    oneParent = []
    
    for k,v in parentCounter.items():
        if v == 1:
            zeroParent.append(k)
        if v == 2:
            oneParent.append(k)
            

    return zeroParent , oneParent
        
    

if __name__ == "__main__":
    
    zeroParent , oneParent = findParentCases(parent_child_pairs)
    
    print("Zero parent individuals: ", zeroParent)
    print("Zero parent individuals: ", oneParent)