# Introduction
Le but de ce notebook est de relever les anomalies présentes sur les notices d'exemplaire afin de pouvoir exporter des listes à destination des aquéreurs. Le script se compose des parties suivantes :

1. La visualisation de la table des notices d'exemplaire

2. Une suite de requêtes sur chaque colonne de la table
+ Les anomalies sur les codes-barre


# 1. Visualisation de la table des notices d'exemplaire

In [14]:
import pandas as pd
from datetime import datetime

from kiblib.utils.db import DbConn

In [2]:
db_conn = DbConn().create_engine()

On définit la variable **query** comme une requête SQL dans laquelle aux champs de la table *items* on a ajouté le champs *itemtype* de la table *biblioitems*

In [3]:
query = """SELECT i.itemnumber, i.biblionumber, i.biblioitemnumber, i.barcode, i.dateaccessioned, i.booksellerid, i.homebranch, i.price, i.replacementprice, i.replacementpricedate, i.datelastborrowed, i.datelastseen, i.stack, i.notforloan, i.damaged, i.damaged_on, i.itemlost, i.itemlost_on, i.withdrawn, i.withdrawn_on, i.itemcallnumber, i.coded_location_qualifier, i.issues, i.renewals, i.reserves, i.restricted, i.itemnotes, i.itemnotes_nonpublic, i.holdingbranch,i.timestamp, i.location, i.permanent_location, i.onloan, i.cn_source, i.cn_sort, i.ccode, i.materials, i.uri, i.itype, i.more_subfields_xml, i.enumchron, i.copynumber, i.stocknumber, i.new_status, i.exclude_from_local_holds_priority, bi.itemtype
FROM koha_prod.items i
JOIN koha_prod.biblioitems bi ON bi.biblionumber = i.biblionumber """

On définit ensuite la variable *items* qui se construit à partir de la fonction *pd.read_sql()* et de la variable *query* puis l'on affiche la variable *items*

In [4]:
items = pd.read_sql(query, db_conn)
items

Unnamed: 0,itemnumber,biblionumber,biblioitemnumber,barcode,dateaccessioned,booksellerid,homebranch,price,replacementprice,replacementpricedate,...,materials,uri,itype,more_subfields_xml,enumchron,copynumber,stocknumber,new_status,exclude_from_local_holds_priority,itemtype
0,1,1,1,C0001353993,2005-03-22,,MED,8.99,8.99,,...,,,PRETLIV,,,,,,,LI
1,3,1,1,C0000653853,2005-03-22,,MED,8.99,8.99,,...,,,PRETLIV,,,,,,,LI
2,4,1,1,C0003476991,2005-03-22,,MED,1.00,1.00,,...,,,PRETLIV,,,,,,,LI
3,5,1,1,C0001499529,2005-03-22,,MED,8.99,8.99,,...,,,PRETLIV,,,,,,,LI
4,8,1,1,C0001353935,2005-03-22,,MED,8.99,8.99,,...,,,PRETLIV,,,,,,,LI
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
287637,451111,350870,350870,C3100010662,2022-10-19,,MED,,,2022-10-19,...,,,PRETSON,"<?xml version=""1.0"" encoding=""UTF-8""?>\n<colle...",,,,,,DC
287638,451114,350871,350871,C3100010663,2022-10-19,,MED,,,2022-10-19,...,,,PRETSON,"<?xml version=""1.0"" encoding=""UTF-8""?>\n<colle...",,,,,,DC
287639,451122,350872,350872,C3100010664,2022-10-19,,MED,,,2022-10-19,...,,,PRETSON,"<?xml version=""1.0"" encoding=""UTF-8""?>\n<colle...",,,,,,DC
287640,451125,350873,350873,C3100010665,2022-10-19,,MED,,,2022-10-19,...,,,PRETSON,"<?xml version=""1.0"" encoding=""UTF-8""?>\n<colle...",,,,,,DC


In [16]:
# pour transformer une chaîne de carcatères en date
items['dateaccessioned'] = pd.to_datetime(items['dateaccessioned'])

In [20]:
items['dateaccessioned'].year

AttributeError: 'Series' object has no attribute 'year'

In [5]:
items['homebranch'].value_counts(normalize=True)

MED    0.960871
MUS    0.025358
BUS    0.013771
Name: homebranch, dtype: float64

In [6]:
items[items['homebranch'].isna()] #Equivaut ici à sélectionner avec une condition (WHERE)

Unnamed: 0,itemnumber,biblionumber,biblioitemnumber,barcode,dateaccessioned,booksellerid,homebranch,price,replacementprice,replacementpricedate,...,materials,uri,itype,more_subfields_xml,enumchron,copynumber,stocknumber,new_status,exclude_from_local_holds_priority,itemtype


# 2. Anomalies sur les codes-barre 

On définit ici la variable barcode comme liste des notices pour laquelle il n'y a pas de code-barre. Puis on affiche les résultat en entrant barcode.

In [7]:
barcode = items[items['barcode'].isna()]
barcode

Unnamed: 0,itemnumber,biblionumber,biblioitemnumber,barcode,dateaccessioned,booksellerid,homebranch,price,replacementprice,replacementpricedate,...,materials,uri,itype,more_subfields_xml,enumchron,copynumber,stocknumber,new_status,exclude_from_local_holds_priority,itemtype
45352,430326,70043,70043,,2021-07-27,,MED,,,2021-07-27,...,,,PRETLIV,,,,,,,LI
55543,432377,84901,84901,,2021-09-18,,MED,,,2021-09-18,...,,,PRETLIV,,,,,,,PA
70539,439857,103787,103787,,2022-02-18,,BUS,4.8,4.8,2022-02-18,...,,,PRETLIV,,,,,,,LI
88437,142053,125932,125932,,2005-03-25,,MED,40.0,40.0,,...,,,PRETLIV,,,,,,,LI
108384,394939,154693,154693,,2019-06-12,,MED,,,2019-06-12,...,,,PRETPER,,,,,,,PE
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
287516,450639,350662,350662,,2022-10-12,,MED,,,2022-10-12,...,,,PRETLIV,,,,,,,LI
287517,450640,350663,350663,,2022-10-12,,MED,,,2022-10-12,...,,,PRETLIV,,,,,,,LI
287518,450641,350664,350664,,2022-10-12,,MED,,,2022-10-12,...,,,PRETLIV,,,,,,,LI
287519,450642,350665,350665,,2022-10-12,,MED,,,2022-10-12,...,,,PRETLIV,,,,,,,LI


On souhaite ensuite pour ces notices ne possédant pas de code-barre **vérifier les statut de prêt**. Pour avoir une vue d'ensemble, on va compter le nombre d'occurence pour chaque Valeur (valeurs autorisées)

In [8]:
barcode['notforloan'].value_counts()

-1    182
-2     35
 0     14
-4      3
-3      2
Name: notforloan, dtype: int64

On souhaite dans cette liste de colonnes, filtrer pour ne retenir que certaines valeurs autorisées. Pour sélectionner des valeurs dans une colonnes (équivaut au IN en SQL) il existe **2 méthode** :

* **.isin** : cette fonction permet de sélectionner les valeurs

In [9]:
barcode[barcode['notforloan'].isin([0,-4,-3])]

Unnamed: 0,itemnumber,biblionumber,biblioitemnumber,barcode,dateaccessioned,booksellerid,homebranch,price,replacementprice,replacementpricedate,...,materials,uri,itype,more_subfields_xml,enumchron,copynumber,stocknumber,new_status,exclude_from_local_holds_priority,itemtype
55543,432377,84901,84901,,2021-09-18,,MED,,,2021-09-18,...,,,PRETLIV,,,,,,,PA
88437,142053,125932,125932,,2005-03-25,,MED,40.0,40.0,,...,,,PRETLIV,,,,,,,LI
108384,394939,154693,154693,,2019-06-12,,MED,,,2019-06-12,...,,,PRETPER,,,,,,,PE
109463,398519,154781,154781,,2019-09-13,,MED,,,2019-09-13,...,,,PRETPER,,,,,,,PE
123053,437204,170136,170136,,2021-12-21,,MED,,,2021-12-21,...,,,PRETPER,,,,,,,PE
150391,398200,207159,207159,,2019-09-06,,MED,,,2019-09-06,...,,,PRETLIV,"<?xml version=""1.0"" encoding=""UTF-8""?>\n<colle...",,,,,,DV
154311,393851,212532,212532,,2019-05-23,,MED,,,2019-05-23,...,,,PRETLIV,,,,,,,LI
176346,382325,239036,239036,,2018-10-03,,MED,,,2018-10-03,...,,,PRETLIV,,,,,,,LI
185234,285808,249464,249464,,2012-09-15,,MED,12.0,12.0,,...,,,PRETLIV,,,,,,,LI
188623,444042,253811,253811,,2022-05-17,,MED,,,2022-05-17,...,,,PRETLIV,"<?xml version=""1.0"" encoding=""UTF-8""?>\n<colle...",,,,,,CA


* *~* devant le nom de la colonne + .isin : sélectionner toutes les valeurs qui ne correspondent pas à celles sélectionnées

In [10]:
barcode[~barcode['notforloan'].isin([-1,-2])]

Unnamed: 0,itemnumber,biblionumber,biblioitemnumber,barcode,dateaccessioned,booksellerid,homebranch,price,replacementprice,replacementpricedate,...,materials,uri,itype,more_subfields_xml,enumchron,copynumber,stocknumber,new_status,exclude_from_local_holds_priority,itemtype
55543,432377,84901,84901,,2021-09-18,,MED,,,2021-09-18,...,,,PRETLIV,,,,,,,PA
88437,142053,125932,125932,,2005-03-25,,MED,40.0,40.0,,...,,,PRETLIV,,,,,,,LI
108384,394939,154693,154693,,2019-06-12,,MED,,,2019-06-12,...,,,PRETPER,,,,,,,PE
109463,398519,154781,154781,,2019-09-13,,MED,,,2019-09-13,...,,,PRETPER,,,,,,,PE
123053,437204,170136,170136,,2021-12-21,,MED,,,2021-12-21,...,,,PRETPER,,,,,,,PE
150391,398200,207159,207159,,2019-09-06,,MED,,,2019-09-06,...,,,PRETLIV,"<?xml version=""1.0"" encoding=""UTF-8""?>\n<colle...",,,,,,DV
154311,393851,212532,212532,,2019-05-23,,MED,,,2019-05-23,...,,,PRETLIV,,,,,,,LI
176346,382325,239036,239036,,2018-10-03,,MED,,,2018-10-03,...,,,PRETLIV,,,,,,,LI
185234,285808,249464,249464,,2012-09-15,,MED,12.0,12.0,,...,,,PRETLIV,,,,,,,LI
188623,444042,253811,253811,,2022-05-17,,MED,,,2022-05-17,...,,,PRETLIV,"<?xml version=""1.0"" encoding=""UTF-8""?>\n<colle...",,,,,,CA


In [11]:
anomalies_barcode1 = barcode[barcode['notforloan'].isin([0,-4,-3])]

In [12]:
colonnes_a_exporter = ['barcode',
       'dateaccessioned', 'homebranch', 'price',
       'replacementprice', 'datelastborrowed',
       'datelastseen', 'notforloan', 'damaged', 'damaged_on',
       'itemlost', 'itemlost_on', 'withdrawn', 'withdrawn_on',
       'itemcallnumber','holdingbranch', 'timestamp', 'location',
       'onloan', 'ccode','itemtype']

In [13]:
anomalies_barcode1[colonnes_a_exporter].to_excel('liste_anomalies1.xlsx',index=False)

# Vérifier la structure des codes barre pour la prochaine fois