Skip to content
This repository has been archived by the owner on Jan 13, 2024. It is now read-only.

Commit

Permalink
amélioration de l'énoncé TD 1A 7 + communiation python, R
Browse files Browse the repository at this point in the history
  • Loading branch information
sdpython committed Oct 18, 2014
1 parent 4468353 commit 14eddcc
Show file tree
Hide file tree
Showing 5 changed files with 479 additions and 101 deletions.
1 change: 1 addition & 0 deletions .gitignore
Expand Up @@ -235,6 +235,7 @@ version.txt
*/notebooks/td1a/*.dbf
*/notebooks/td1a/python*.png
*/notebooks/td1a/*.c
*/notebooks/td1a/facebook*
*/notebooks/td1a/*.cpp
*/notebooks/td1a/*.pyx
*/notebooks/td1a/*.pyd
Expand Down
266 changes: 247 additions & 19 deletions _doc/notebooks/td1a/td1a_cenonce_session7.ipynb
@@ -1,7 +1,7 @@
{
"metadata": {
"name": "",
"signature": "sha256:dfad1d65a14cd6a5554341e373130f232481a8f78406c9f530c0dababdc8b319"
"signature": "sha256:fac6028aa25c8b3ccda64f803385052e284e5efcc3f2027e736d96a326094bcf"
},
"nbformat": 3,
"nbformat_minor": 0,
Expand Down Expand Up @@ -34,6 +34,7 @@
" * [Exercice 6](#exo6)\n",
" * [Exercice 7](#exo7)\n",
" * [Exercice 8](#exo8)\n",
"* [Prolongements : degr\u00e9 de s\u00e9paration sur Facebook](#prol)\n",
"\n",
"La [programmation dynamique](http://fr.wikipedia.org/wiki/Programmation\\_dynamique) est une fa\u00e7on de r\u00e9soudre de mani\u00e8re similaire une classe de probl\u00e8mes d'optimisation qui v\u00e9rifie la m\u00eame propri\u00e9t\u00e9. On suppose qu'il est possible de d\u00e9couper le probl\u00e8me $P$ en plusieurs parties $P_1$, $P_2$, ... Si $S$ est la solution optimale du probl\u00e8me $P$, alors chaque partie $S_1$, $S_2$, ... de cette solution appliqu\u00e9e aux sous-probl\u00e8mes est aussi optimale.\n",
"\n",
Expand All @@ -51,10 +52,7 @@
"collapsed": false,
"input": [
"import pyensae\n",
"pyensae.download_data(\"matrix_distance_7398.zip\", website = \"xd\")\n",
"import pandas\n",
"df = pandas.read_csv(\"matrix_distance_7398.txt\", sep=\"\\t\")\n",
"df.head()"
"pyensae.download_data(\"matrix_distance_7398.zip\", website = \"xd\")"
],
"language": "python",
"metadata": {},
Expand All @@ -74,16 +72,45 @@
" matrix_distance_7398.txt to .\\matrix_distance_7398.txt\n"
]
},
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 1,
"text": [
"['.\\\\matrix_distance_7398.txt']"
]
}
],
"prompt_number": 1
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"On peut lire ce fichier soit avec le module [pandas](http://pandas.pydata.org/) introduit lors de la s\u00e9ance 10 [TD 10 : DataFrame et Matrice](http://www.xavierdupre.fr/app/ensae_teaching_cs/helpsphinx/notebooks/td1a_cenonce_session_10.html#io) :"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"import pandas\n",
"df = pandas.read_csv(\"matrix_distance_7398.txt\", sep=\"\\t\", header=False, names=[\"v1\",\"v2\",\"distance\"])\n",
"df.head()"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Boulogne-Billancourt</th>\n",
" <th>Beauvais</th>\n",
" <th>85597</th>\n",
" <th>v1</th>\n",
" <th>v2</th>\n",
" <th>distance</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
Expand Down Expand Up @@ -119,25 +146,84 @@
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>5 rows \u00d7 3 columns</p>\n",
"</div>"
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 1,
"prompt_number": 4,
"text": [
" Boulogne-Billancourt Beauvais 85597\n",
"0 Courbevoie Sevran 26564\n",
"1 Colombes Alfortville 36843\n",
"2 Bagneux Marcq-En-Baroeul 233455\n",
"3 Suresnes Gennevilliers 10443\n",
"4 Lens Maubeuge 93768\n",
"\n",
"[5 rows x 3 columns]"
" v1 v2 distance\n",
"0 Courbevoie Sevran 26564\n",
"1 Colombes Alfortville 36843\n",
"2 Bagneux Marcq-En-Baroeul 233455\n",
"3 Suresnes Gennevilliers 10443\n",
"4 Lens Maubeuge 93768"
]
}
],
"prompt_number": 1
"prompt_number": 4
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Le membre ``values`` se comporte comme une matrice, une liste de listes :"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"matrice = df.values\n",
"matrice[:5]"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 5,
"text": [
"array([['Courbevoie', 'Sevran', 26564],\n",
" ['Colombes', 'Alfortville', 36843],\n",
" ['Bagneux', 'Marcq-En-Baroeul', 233455],\n",
" ['Suresnes', 'Gennevilliers', 10443],\n",
" ['Lens', 'Maubeuge', 93768]], dtype=object)"
]
}
],
"prompt_number": 5
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"On peut aussi utiliser le petit exemple qui a \u00e9t\u00e9 pr\u00e9sent\u00e9 lors de la s\u00e9ance 4 sur les fichiers [TD 4 : Modules, fichiers, expressions r\u00e9guli\u00e8res](http://www.xavierdupre.fr/app/ensae_teaching_cs/helpsphinx/notebooks/td1a_cenonce_session4.html#file). Les donn\u00e9es se pr\u00e9sente sous forme de matrice. Les deux premi\u00e8res colonnes sont des cha\u00eenes de caract\u00e8res, la derni\u00e8re est une valeur num\u00e9rique qu'il faut convertir."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"with open (\"matrix_distance_7398.txt\", \"r\") as f :\n",
" matrice = [ row.strip(' \\n').split('\\t') for row in f.readlines() ]\n",
"for row in matrice:\n",
" row[2] = float(row[2])\n",
"print(matrice[:5])"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"[['Boulogne-Billancourt', 'Beauvais', 85597.0], ['Courbevoie', 'Sevran', 26564.0], ['Colombes', 'Alfortville', 36843.0], ['Bagneux', 'Marcq-En-Baroeul', 233455.0], ['Suresnes', 'Gennevilliers', 10443.0]]\n"
]
}
],
"prompt_number": 6
},
{
"cell_type": "markdown",
Expand Down Expand Up @@ -330,6 +416,148 @@
"Quels sont les co\u00fbts des deux algorithmes (plus court chemin et ski) ?"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<h3 id=\"prol\">Prolongements : degr\u00e9 de s\u00e9paration sur Facebook</h3>\n",
"\n",
"Le plus court chemin dans un graphe est un des algorithmes les plus connus en programmation. Il permet de d\u00e9terminer la solution en un co\u00fbt **polyn\u00f4mial** - chaque it\u00e9ration est en $O(n^2)$. La programmation dynamique caract\u00e8rise le passage d'une vision combinatoire \u00e0 une compr\u00e9hension r\u00e9cursif du m\u00eame probl\u00e8me. Dans le cas du plus court chemin, l'approche combinatoire consiste \u00e0 \u00e9num\u00e9rer tous les chemins du graphe. L'approche dynamique consiste \u00e0 d\u00e9montrer que la premi\u00e8re approche combinatoire aboutit \u00e0 un calcul tr\u00e8s redondant. On note $e(v,w)$ la matrice des longueurs des routes, $e(v,w) = \\infty$ s'il n'existe aucune route entre les villes $v$ et $w$. On suppose que $e(v,w)=e(w,v)$. La construction du tableau ``d`` se d\u00e9finit de mani\u00e8re it\u00e9rative et r\u00e9cursive comme suit :\n",
"\n",
"**Etape 0**\n",
"\n",
"$$d(v) = \\infty, \\, \\forall v \\in V$$\n",
"\n",
"**Etape $n$**\n",
"\n",
"$$ d(v) = \\left \\{ \\begin{array}{ll} 0 & \\text{si } v = \\text{Charleville-Mezieres} \\\\ \\min \\{ d(w) + e(v,w) \\, | \\, w \\in V \\} & \\text{sinon} \\end{array} \\right . $$\n",
"\n",
"\n",
"Tant que l'\u00e9tape $n$ continue \u00e0 faire des mises \u00e0 jour ($\\sum_v d(v)$ diminue), on r\u00e9p\u00e8te l'\u00e9tape $n$. Ce m\u00eame algorithme peut \u00eatre appliqu\u00e9 pour d\u00e9terminer le [degr\u00e9 de s\u00e9paration](http://www.atlantico.fr/decryptage/theorie-six-degres-separation-relations-entre-individus-facebook-nombre-amis-229803.html) dans un r\u00e9seau social. L'agorithme s'applique presque tel quel \u00e0 condition de d\u00e9finir ce que sont une ville et une distance entre villes dans ce nouveau graphe. Vous pouvez tester vos id\u00e9es sur cet exemple de graphe [Social circles: Facebook](http://snap.stanford.edu/data/egonets-Facebook.html). L'algorithme de [Dikjstra](http://fr.wikipedia.org/wiki/Algorithme_de_Dijkstra) calcule le plus court chemin entre deux noeuds d'un graphe, l'algorithme de [Bellman-Ford](http://fr.wikipedia.org/wiki/Algorithme_de_Bellman-Ford) est une variante qui calcule toutes les distances des plus courts chemin entre deux noeuds d'un graphe."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"import pyensae # utiliser pyensae >= 0.8\n",
"files = pyensae.download_data(\"facebook.tar.gz\",website=\"http://snap.stanford.edu/data/\")\n",
"fe = [ f for f in files if \"edge\" in f ]\n",
"fe"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 2,
"text": [
"['.\\\\facebook/0.edges',\n",
" '.\\\\facebook/348.edges',\n",
" '.\\\\facebook/414.edges',\n",
" '.\\\\facebook/698.edges',\n",
" '.\\\\facebook/107.edges',\n",
" '.\\\\facebook/3437.edges',\n",
" '.\\\\facebook/3980.edges',\n",
" '.\\\\facebook/1912.edges',\n",
" '.\\\\facebook/1684.edges',\n",
" '.\\\\facebook/686.edges']"
]
}
],
"prompt_number": 2
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Il faut d\u00e9compresser ce fichier avec [7zip](http://www.7-zip.org/) si vous utilisez ``pysense < 0.8``. Sous Linux (et Mac), il faudra utiliser une commande d\u00e9crite ici [tar](http://doc.ubuntu-fr.org/tar)."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"import pandas\n",
"df = pandas.read_csv(\"facebook/1912.edges\", sep=\" \", names=[\"v1\",\"v2\"])\n",
"print(df.shape)\n",
"df.head()"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"(60050, 2)\n"
]
},
{
"html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>v1</th>\n",
" <th>v2</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td> 2290</td>\n",
" <td> 2363</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td> 2346</td>\n",
" <td> 2025</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td> 2140</td>\n",
" <td> 2428</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td> 2201</td>\n",
" <td> 2506</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td> 2425</td>\n",
" <td> 2557</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 4,
"text": [
" v1 v2\n",
"0 2290 2363\n",
"1 2346 2025\n",
"2 2140 2428\n",
"3 2201 2506\n",
"4 2425 2557"
]
}
],
"prompt_number": 4
},
{
"cell_type": "code",
"collapsed": false,
Expand Down

0 comments on commit 14eddcc

Please sign in to comment.